The Data Charmer: Two quick performance tips with MySQL 5.1 partitions

Thursday, May 06, 2010

Two quick performance tips with MySQL 5.1 partitions

While I was researching for my partitions tutorial, I came across two hidden problems, which may happen often, but are somehow difficult to detect and even more difficult to fix, unless you know what's going on, and why. I presented both cases during my tutorial, but there were no pictures to convey the mechanics of the problem. Here is the full story.

TO_DAYS() prunes two partitions instead of one

If you are partitioning by date, chances are that you are using TO_DAYS(). And depending on how you have partitioned your table, your queries are as fast as you expect them to be. However, there are cases where your query takes twice as long as it should, and of course this will not make you happy.

For example, in a table partitioned by month, when your query searches for values within one specific month, EXPLAIN PARTITIONS tells you that the search involves two partitions (see figure above). This means that, instead of searching through 1,000,000 rows in one partitions, the partitions engine is searching through 2,000,000 rows in two partitions.
But why? The reasoning, as reported from the developers, is that

This is not a bug, since TO_DAYS() returns NULL for invalid dates, it needs to scan the first partition as well (since that holds all NULL values) for ranges.

Bug#49754: Partitioning by RANGE with TO_DAYS always includes first partition when pruning
This makes sense, from a developer's standpoint. From a user's experience, though, it's a bug.
Anyway, it doesn't do us any good to rant about it. Our query is still twice as slow as we want it. We need to take action. The workaround is to create an empty partition in first position. If we are creating a new table, it's simple. Just say

PARTITION p000 VALUES LESS THAN (0)

and all will be well. The partition pruning mechanism will still find two partitions, but since the first one is empty, it won't impact the performance.
If you have an existing table already partitioned, like in our example, then you need to perform a different operation

Now we have a different first partition, with no records. When we issue the same query, the partition pruning will look at partition p0, but it will skip it because there are no records.

Inserting single rows in partitions is slow

Also this bug is sometimes difficult to detect. If you want to test partitions in MySQL 5.1, probably you will take an existing table and convert it to a partitioned one, or you create a new table and load the contents from a dump. Either way, you are unlikely to insert millions of records with single INSERT statements. These single inserts are slower than bulk inserts in the first place, but with partitions there is an additional penalty. Whenever you insert a record, the partitioning engine locks the entire table. When you insert thousands of records, the partitioning engine will lock all partitions before the insert, and unlock them after the insert. If you have 500 partitions, that's 500 locks and 500 unlocks for every statement. Ouch!
It's a design problem, and it is not likely to be fixed without turning around the whole architecture of partitions. Also in this case, there is a bug report, Partitioning performance drops drastically with hundreds of partitions, although nobody says that this is a feature.
What can you do, then? You have several choices:

You can use a bulk insert. Instead of single statements, use INSERT with multiple records, or LOAD DATA INFILE.
Explicitly LOCK the table before inserting and UNLOCK it after you finish with all the inserts. This will avoid the overhead, although it won't make your table concurrently accessible until you finish.
If you use partitioning only to facilitate heavy queries, consider using a non-partitioned table on the master, and partitioned ARCHIVE tables on the slaves (see figure below).

As I have said many times in my presentations, always benchmark before using partitions in production. Whether you think that it will boost your performance or that it will slow things down, don't trust your instincts, and test. You may be up for a surprise.

9 comments:

rpbouman said...: Great post!

Thanks for explaining, and thanks for the workarounds.; May 6, 2010 at 12:26:00 PM GMT+2
Shlomi Noach said...: Thanks for this post. The #2 issue is one I have just tackled a couple days ago. Great to have this clarified!; May 6, 2010 at 7:02:00 PM GMT+2
Matthew Montgomery said...: Thankfully this is fixed in 5.5 with the PARTITION BY RANGE COLUMNS() feature. It will no longer require TO_DAYS() and will place those invalid dates in line with valid dates.

mysql> CREATE TABLE partition_date_test (dt DATE PRIMARY KEY)
-> PARTITION BY RANGE COLUMNS(dt) (
-> PARTITION day_20100226 VALUES LESS THAN ('2010-02-27'),
-> PARTITION day_20100227 VALUES LESS THAN ('2010-02-28'),
-> PARTITION day_20100228 VALUES LESS THAN ('2010-03-01'),
-> PARTITION day_20100301 VALUES LESS THAN ('2010-03-02'),
-> PARTITION day_20100302 VALUES LESS THAN ('2010-03-03'),
-> PARTITION day_20100303 VALUES LESS THAN ('2010-03-04'),
-> PARTITION max VALUES LESS THAN MAXVALUE);
Query OK, 0 rows affected (0.06 sec)

mysql> SET SQL_MODE = 'allow_invalid_dates';
Query OK, 0 rows affected (0.00 sec)

mysql> INSERT INTO partition_date_test VALUES ('2010-02-26'),('2010-02-27'),('2010-02-28'),('2010-02-29'),('2010-03-00'),('2010-03-02'),('2010-03-03'),('2010-03-04');
Query OK, 8 rows affected (0.00 sec)
Records: 8 Duplicates: 0 Warnings: 0

mysql> SET SQL_MODE = '';
Query OK, 0 rows affected (0.00 sec)

mysql> EXPLAIN PARTITIONS SELECT dt FROM partition_date_test WHERE dt BETWEEN '2010-02-28' AND '2010-03-02';
+----+-------------+---------------------+----------------------------------------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------------------+----------------------------------------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | partition_date_test | day_20100228,day_20100301,day_20100302 | index | PRIMARY | PRIMARY | 3 | NULL | 4 | Using where; Using index |
+----+-------------+---------------------+----------------------------------------+-------+---------------+---------+---------+------+------+--------------------------+
1 row in set (0.01 sec); May 6, 2010 at 8:45:00 PM GMT+2
Unknown said...: 'Whenever you insert a record, the partitioning engine locks the entire table' is easy to misunderstand:

What happens is that it does forward the lock requests to ALL partitions, it does not do any additional locking besides the locking the underlying storage engines does! So if you use partitioned InnoDB it will still use row-locks, but since open and locking is done before pruning, it will call open for all partitions and then lock for all partitions, and that is bug#37252.

What this means is that you can gain on concurrent inserts, due to all partitions are independent, but each statement has an overhead which becomes higher the more partitions you have.

One related thing that decreased the overhead was to limit ha_partition::records_in_range (http://bugs.mysql.com/bug.php?id=48846); May 7, 2010 at 10:25:00 AM GMT+2
Giuseppe Maxia said...: Matthew,
Indeed. MySQL 5.5 solves the issue nicely. I forgot to mention it. Since I wrote two articles on this matter, perhaps I felt that everyone knows.
Thanks for the reminder
Giuseppe; May 7, 2010 at 11:28:00 AM GMT+2
Unknown said...: Giuseppe, it'll be fine to update your blog regarding the statement about "table lock on insert" - many people will not follow comments and stay with idea that partitions are dangerous for OLTP..

While the table lock is true for MyISAM, it's not true for InnoDB, and recently I've published the benchmark results showing performance improvements when partitions are used: http://dimitrik.free.fr/blog/archives/2010/05/mysql-performance-improving-stability.html

Rgds,
-Dimitri; May 7, 2010 at 5:43:00 PM GMT+2
Giuseppe Maxia said...: Dimitri,
The improvement on locking happens only on 5.5 innodb, or am I missing something?
In MySQL 5.1, the whole table is locked, regardless of the engine.
http://bugs.mysql.com/bug.php?id=37252

Giuseppe; May 7, 2010 at 6:03:00 PM GMT+2
Unknown said...: Hi Giuseppe.

There is no table lock in partitioning, not even in 5.1!

bug#37252 is that open and lock is not prunable.

All locking is handled in the underlying storage engines, the general partitioning handler does only forward the locking calls.

So partitioning should lead to less mutex contention also in 5.1 (but will probably also hit the dictionary mutex as shown in Dimitri's benchmark).

Since the benchmark only uses 3 partitions, the overhead of bug#37252 is ignorable, but the mutex contention seems to drop and increases the performance.

Sorry Giuseppe if I have not been clear enough about the problem with bug#37252.; May 7, 2010 at 6:26:00 PM GMT+2
Unknown said...: Does this mean inserts on 5.1 would be slower if I use myisam? Can it be rectified by using innodb? Or does it need a version upgrade to 5.5?; February 22, 2011 at 5:09:00 PM GMT+1