Monday, September 15, 2008

Checking and repairing partitions in MySQL 5.1

Talking with users at OSCON 2008, the biggest fear with new features was table corruption with partitioning. Until recently, this fear was substantiated by bug#20129, which reported inability of repairing corrupted partitions. The initial "fix" for this bug was to disable the offending commands.
After a while, especially when the Support department complained loudly, the bug was reopen and a proper fix implemented. Now you can maintain a partition using ALTER TABLE ... CHECK/ANALYZE/OPTIMIZE/REPAIR PARTITION.
I did some experiment, using MySQL sandbox and the employees test database, which supports partitioning.
You need to download the latest MySQL 5.1 release (5.1.28), where the patch is available.
To load the employees database with partitions, you edit employees_partitioned.sql to choose MyISAM as default engine and then run
$ $HOME/sandboxes/msb_5_1_28/use -t <>
Once we are satisfied that the data is safe, we can start messing with it. To simulate a corruption in MyISAM, the easiest way is to copy the index for a partition to the index of a different one.
$ cd $HOME/sandboxes/msb_5_1_28/
$ ./stop
$ cp data/employees/titles#P#p08.MYI data/employees/titles#P#p09.MYI
$ ./start
Now at least one partition is wrong.
ALTER TABLE titles CHECK PARTITION p09;
+------------------+-------+----------+-----------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------------+-------+----------+-----------------------------------------------------+
| employees.titles | check | warning | Size of datafile is: 768416 Should be: 680072 |
| employees.titles | check | error | Record-count is not ok; is 28286 Should be: 25274 |
| employees.titles | check | warning | Found 28286 key parts. Should be: 25274 |
| employees.titles | check | error | Partition p09 returned error |
| employees.titles | check | error | Corrupt |
+------------------+-------+----------+-----------------------------------------------------+

ALTER TABLE titles repair PARTITION p09;
+------------------+--------+----------+--------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------------+--------+----------+--------------------------------------------+
| employees.titles | repair | warning | Number of rows changed from 25274 to 28286 |
| employees.titles | repair | status | OK |
+------------------+--------+----------+--------------------------------------------+
Run the test suite again, and you'll see that the feature works as advertised. However, what happens if we have more than one corrupted partition? Let's find out.
$ ./stop
$ for P in 02 05 10 11 16 ; \
do /bin/cp data/employees/titles#P#p01.MYI \
data/employees/titles#P#p$P.MYI; \
done
$ ./start
Now there are 5 corrupted partitions. Checking them one by one would be tedious. One side effect of fixing bug#20129 is that now you can use CHECK TABLE on partitioned tables (the manual states otherwise today (September 15, 2008) but it should be fixed soon).
check table titles;
+------------------+-------+----------+-------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------------+-------+----------+-------------------------------------------------+
| employees.titles | check | warning | Size of datafile is: 478924 Should be: 0 |
| employees.titles | check | error | Record-count is not ok; is 18293 Should be: 0 |
| employees.titles | check | warning | Found 18293 key parts. Should be: 0 |
| employees.titles | check | error | Partition p02 returned error |
| employees.titles | check | error | Corrupt |
+------------------+-------+----------+-------------------------------------------------+
Interestingly, CHECK TABLE reports only the first corruption. Let's fix it and try again.
alter table titles repair partition p02;
+------------------+--------+----------+----------------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------------+--------+----------+----------------------------------------+
| employees.titles | repair | warning | Number of rows changed from 0 to 18293 |
| employees.titles | repair | status | OK |
+------------------+--------+----------+----------------------------------------+
2 rows in set (0.07 sec)

mysql [localhost] {msandbox} (employees) > check table titles;
+------------------+-------+----------+-------------------------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------------+-------+----------+-------------------------------------------------+
| employees.titles | check | warning | Size of datafile is: 526268 Should be: 0 |
| employees.titles | check | error | Record-count is not ok; is 20107 Should be: 0 |
| employees.titles | check | warning | Found 20107 key parts. Should be: 0 |
| employees.titles | check | error | Partition p05 returned error |
| employees.titles | check | error | Corrupt |
+------------------+-------+----------+-------------------------------------------------+
Hmm. This could be a painfully long process if I have an unknown number of corrupted partitions and many partitions to check. Fortunately, the same side effect that made possible CHECK TABLE is also allowing REPAIR TABLE. Let's do it.
repair table titles;
+------------------+--------+----------+----------------------------------------+
| Table | Op | Msg_type | Msg_text |
+------------------+--------+----------+----------------------------------------+
| employees.titles | repair | warning | Number of rows changed from 0 to 20107 |
| employees.titles | repair | warning | Number of rows changed from 0 to 30784 |
| employees.titles | repair | warning | Number of rows changed from 0 to 33367 |
| employees.titles | repair | warning | Number of rows changed from 0 to 33345 |
| employees.titles | repair | status | OK |
+------------------+--------+----------+----------------------------------------+
The four remaining corrupted partitions were fixed with one single command. Again, use the test suite to make sure that your data is OK.
Thanks, MySQL developers. Good job!

1 comment:

Jeremy Cole said...

And hey, it only took 2 years 3 months to fix! So fast!