Sunday, June 21, 2009

Failing by choice. Another bug-vs-feature debate ends

A long standing bug

Among the many outstanding bugs for MySQL, there is one that has sparked a fierce discussion, not only in the bug report itself, but also in blogs, forums, mailing lists.
Bug #19027: MySQL 5.0 starts even with Fatal InnoDB errors was neglected for long time, until finally it got fixed, and it is available in MySQL 5.1.36.
First off, what is it about?
In short, if an engine doesn't initialize correctly, the MySQL server starts anyway, without the offending engine. Depending on how you use the failing engine, this could be either a minor annoyance or a complete disaster.
Annoyance: ARCHIVE fails to initialize, and you create tables with the default engine (usually MyISAM), and after a while you realize that the disk is filling up faster than you expected. You will find out eventually.
Disaster: ARCHIVE fails to initialize, and you copy 300 GB of data into a table that you believed was ARCHIVE and instead is MyISAM, filling the disk and halting your server completely.
Annoyance: InnoDB fails to initialize on your new slave, and you create MyISAM tables instead. After a while you realize that this slave is serving queries way more slowly than the others.
Disaster: InnoDB fails to initialize on your master, which was already using InnoDB tables. None of you application work anymore. You can't read or write from your tables. You are completely stuck.

The same old story: bug or feature?

While I admit that some cases could be classified as annoyances, I am convinced, by personal experience coupled with gut feeling, that disaster is a more frequent outcome of this situation. When the engine fails and the server happily starts without your valuable data you are in trouble. True, you can monitor the error log and have a cron job sending you an alert if this happens, but by then it may be already too late.
So, why did it take this long to fix it? The bug was filed in April 2006.
The reason is that this bug was labeled between low priority and feature request. The behavior was documented and consistent with other similar cases, and many developers felt that by satisfying the request of the bug reporter we were adding a new feature. If this were a philosophical discussion, yes, it would be so. But this is about real life usage of databases, and it's my strong opinion that it is a bug, because it can harm the user, and it feels wrong.
I wrote about this matter before. A bug is not justified by being documented. A documented wrong behavior is a bug nonetheless.

How it works

Let's have a look at the practicalities. In MySQL 5.1.35, you create a InnoDB table and then restart the server with a mistake in your my.cnf. You inadvertently changed the size of the InnoDB file. For example, you set innodb_data_file_path=ibdata1:2000M while the original size was 200 MB.

$ $HOME/sandboxes/msb_5_1_35/restart --innodb_data_file_path=ibdata1:2000M
. sandbox server started
# the server starts
$ $HOME/sandboxes/msb_5_1_35/use -e "show engines" | grep -i innodb
Your query for the InnoDB engine is met by Unix nothingness. InnoDB failed to initialize, but the server started, leaving your data crippled and the DBAs blissfully unaware, unless they were checking for this specific occurrence.
What happens with 5.1.36? By default, it does exactly the same. If you think that this kind of failure could be a minor annoyance, you leave things as they are, and the failure of an engine, from CSV to InnoDB, won't bother you more than usually.
However, if you want to force the server to abort when your engine fails, now you can say so. You need to specify the desired behavior in the options file.
commandmeaningexample
engine_name=OFFThe specified engine will not be included. The server will start without it.innodb=off
engine_name=ONThe specified engine will be included. If it fails, the server will start.innodb=on
engine_name=FORCEThe specified engine will be included. If it fails, the server will NOT start.innodb=force

Testing the above assumptions with the latest version shows a different result.
$ cd $HOME/sandboxes/msb_5_1_36
./restart --innodb=force --innodb_data_file_path=ibdata1:2000M
................... sandbox server not started yet
With the choice to fail now available, we can decide if an engine is so important that we can't live without, and instruct the server that it's better to fail than starting in crippled mode.
Another usability bug bites the dust.

2 comments:

Jonathan Levin said...

Oh wow. I had this problem and it took me a long time to stop panicking enough to read some documentation about how to fix it.

Really scared me when I saw that half the tables were possibly corrupted.

arjen said...

With InnoDB disabled, without the appropriate sql modes set, creating/using these tables will also see foreign key constraints disappearing, transactions that can't be rolled back, etc. It's quite evil.

I already put my vote, words, and blog posts in on that bug - and I now say *HORAY!* for fixing. Great work Kristoffer.