Wednesday, August 22, 2012

MySQL 5.6 replication gotchas (and bugs)

There has been a lot of talk about MySQL 5.6 replication improvements. With few exceptions, what I have seen was either marketing messages or hearsay. This means that few people have really tried out the new features to see whether they meet the users needs.

As usual, I did try the new version in my environment. I like to form my own opinion based on experiments, and so I have been trying out these features since they have appeared in early milestones.

What follows is a list of (potentially) surprising results that you may get when using MySQL 5.6.
All the examples are made using MySQL 5.6.6.

Gotcha #1 : too much noise

I have already mentioned that MySQL 5.6 is too verbose when creating data directory. This also means that your error log may have way more information than you'd like to get. You should check the contents of the error log when you start, and either clean it up before using it on a regular basis or take note of what's there after a successful installation, so you won't be surprised when something goes wrong.

Gotcha #2 : Innodb tables where you don't expect them

Until version 5.5, after you installed MySQL, you could safely drop the ib* files, change the configuration file, and restart MySQL with optimized parameters. Not anymore.

When you run mysqld with the --bootstrap option (which is what mysql_install_db does), the server creates 5 innodb tables:

 select table_schema, table_name 
from information_schema .tables 
where engine='innodb';
+--------------+----------------------+
| table_schema | table_name           |
+--------------+----------------------+
| mysql        | innodb_index_stats   |
| mysql        | innodb_table_stats   |
| mysql        | slave_master_info    |
| mysql        | slave_relay_log_info |
| mysql        | slave_worker_info    |
+--------------+----------------------+

The slave_* tables are needed for the safe crash slave feature, which we'll cover later. The innodb_*_stats tables are as documented at Innodb persistent stats, and they seem to contain almost the same info of the tables with the same name that you find in Percona Server INFORMATION_SCHEMA. I can only speculate why these tables are in mysql rather than in performance_schema.

Another side effect of this issue is that, whatever setting you want to apply to innodb (size of the data files, file-per-table, default file format, and so on) must be done when you run mysqld --bootstrap.

Gotcha #3 : Global transaction IDs and security

The information about Global transaction ID is not easy to locate. But eventually, searching the manual, you will get it. The important information that you take from this page is that this feature only works if you enable all these options in all the servers used for replication:

log-bin
server-id=xx
log-slave-updates 
gtid-mode=ON
disable-gtid-unsafe-statements

The first two options are not a surprise. You need them for replication anyway. Check.

The third one is puzzling. Why would you want this option in a master? But then you realize that this will allow any server to be promoted or demoted at will. Check.

gtid-mode is the main option that needs to be enabled for global transaction IDs. Check

The last option forces the server to be safe, by using only transactional tables, and by forbidding things like temporary tables inside transactions and create table ... select. Which means that if you try to update a MyISAM table in the master, the statement will fail. You won't be allowed to do it. Check?

The trouble is, if you enable gtid-mode=ON (with its mandatory ancillary options), you can't run mysql_secure_installation, because that utility needs to delete anonymous users and clean the 'db' table for anonymous usage of the 'test' database.

The workaround is to enable GTID after you secure the installation, which means one more server restart.

Gotcha #4 (bug): multi thread slave won't work without safe-crash slave tables

To enable parallel replication, you need to change the value of 'slave_parallel_workers" to a value between 1 and 1024.

show variables like '%worker%';
+------------------------+-------+
| Variable_name          | Value |
+------------------------+-------+
| slave_parallel_workers | 0     |
+------------------------+-------+
1 row in set (0.00 sec)

slave1 [localhost] {msandbox} ((none)) > stop slave;
Query OK, 0 rows affected (0.06 sec)

slave1 [localhost] {msandbox} ((none)) > set global slave_parallel_workers=5;
Query OK, 0 rows affected (0.00 sec)

slave1 [localhost] {msandbox} (mysql) > start slave;
Query OK, 0 rows affected, 1 warning (0.05 sec)

slave1 [localhost] {msandbox} ((none)) > select * from mysql.slave_worker_info\G
Empty set (0.00 sec)

What the hell? The workers table is empty.

I know the cause: the slave_worker_info table is not activated unless you also set relay_log_info_repository='table'. What I don't understand is WHY it is like that. If this is documented, I could not find where.

Anyway, once you are in this bizarre condition, you can't activate relay_log_info_repository='table', because of the following

Gotcha #5 (bug) : master and relay_log repository must be either set forever or they will fail

After we have activated parallel threads, without enabling table repositories, you can't easily get to a clean replication environment:
set global relay_log_info_repository='table';
start slave;
ERROR 1201 (HY000): Could not initialize master info structure; more error messages can be found in the MySQL error log
And the error log says:
120822 14:15:08 [ERROR] Error creating relay log info: Error transfering information.

What you need to do is

  • stop the slave
  • enable both master_info_repository and relay_log_info_repository as 'table'
  • set the number of parallel threads
  • restart the slave
slave1 [localhost] {msandbox} (mysql) > stop slave;
Query OK, 0 rows affected (0.02 sec)

slave1 [localhost] {msandbox} (mysql) > set global master_info_repository='table';
Query OK, 0 rows affected (0.00 sec)

slave1 [localhost] {msandbox} (mysql) > set global relay_log_info_repository='table';
Query OK, 0 rows affected (0.00 sec)

slave1 [localhost] {msandbox} (mysql) > set global slave_parallel_workers=5;
Query OK, 0 rows affected (0.00 sec)

slave1 [localhost] {msandbox} (mysql) > start slave;
Query OK, 0 rows affected, 1 warning (0.01 sec)

slave1 [localhost] {msandbox} (mysql) > select count(*) from slave_worker_info;
+----------+
| count(*) |
+----------+
|        5 |
+----------+
1 row in set (0.00 sec)

This sequence of commands will start parallel replication, although MySQL crashes when restarting the slave.

Gotcha #6 : Global transaction IDs not used in parallel threads

Global transaction IDs (GTIDs) are very useful when you need to switch roles from master to slave, and especially when you deal with unplanned failovers. They are also a great simplification in many cases where you need to identify a transaction without getting lost in the details of binary log file and position.

However, one of the cases where GTIDs would have been most useful, they are not there. The table mysql.slave_workers_info still identifies transactions by binary log and position. Similarly, CHANGE MASTER TO does not use GTIDs, other than allowing the automatic alignment (MASTER_AUTO_POSITION=1). If you need to perform any fine tuning operations, you need to revert to the old binary log + position.

Monday, August 20, 2012

Is Oracle really killing MySQL?

There are plenty of "Oracle-is-killing-MySQL" headlines in the tech world:

Is Oracle really consciously and willingly killing MySQL?

I don't think so.

Is Oracle damaging MySQL by taking the wrong steps? Probably so.

This is my personal opinion, and AFAIK there is no official statement from Oracle on this matter, but I think I can summarize the Oracle standpoint as follows:

  • There is a strong and reasonable concern about security. Oracle promise to its customers is that security breeches will be treated with discretion, and no information will be released that could help potential attackers;
  • There is also an equally strong but unreasonable concern that exposing any bugs and code commits to the public scrutiny will help MySQL competitors;
  • to address the security concern, Oracle wants to hide every aspect of the bug fixing that may reveal security-related information:
    • bug reports that mention how the breech happens;
    • comments to commits that explain what has been done to fix the issue;
    • test cases that show the problem being solved.
  • From the security standpoint, the above steps have been implemented, and they look effective. Unfortunately, they have the side effects that:

    • the bugs database is censored, and does not provide information to users about why they should upgrade;
    • the public trees under Revision Control System are mutilated. In fact, it looks like Oracle has just stopped updating them.
    • contributions to MySQL, which weren't easy before, are now made extremely harder;
    • trust in Oracle good faith as MySQL steward is declining.

    The inevitable side effect is that the moves that have reduced the security risk have also partially addressed Oracle's concern about exposing its innovation to the competition, thus making MySQL de-facto less open. Was it intentional? I don't know. What I know is that these actions, which make MySQL less friendly for MySQL direct competitors, rather than damaging such competitors, are in fact getting the opposite effect, because traditional open source users will have more reasons for looking at alternatives, and these competitors will look more appealing now that Oracle has stiffened its approach to open source.

    The main point with this whole incident is that Oracle values its current customers more than its potential ones. While MySQL AB was focusing its business to the customers that the open source model would attract to its services, Oracle wants first and foremost to make its current customers happy, and it doesn't consider the future ones coming from open source spread worth of its attention. In short, Oracle doesn't get the open source business model.

    OTOH, Oracle is doing a good job in the innovation front. A huge effort is going into new features and improvements in MySQL 5.6, showing that Oracle believes in the business behind MySQL and wants to make it grow. This is an undeniable benefit for MySQL and its users. However, there is less openness than before, because the source comes out less often and not in a shape that is suitable for contributions, but the code is open, and there is growth in both Oracle (which is taking ideas and code from MySQL forks) and MySQL forks, which merge Oracle changes into their releases. Even though the game is not played according to open source purists rules, Oracle is still a main player.

    What can we, the MySQL Community, do?

    We need to reinforce the idea that the open source model still works for MySQL. The business side is the only one that Oracle gets. Unfortunately, the classical Oracle sales model does not see favorably a system where you get customers by distributing a free product and try to please non-customers, with the hope that some of them will eventually buy your services.

    My point is that Oracle is unintentionally harming MySQL and its own image. If Oracle cares about MySQL, it should take action now to amend the fracture, before it becomes too deep.

    I don't have a solution to this issue, but I thought that spelling out the problem would perhaps help to find one.