Sunday, April 17, 2011

Replication metadata in MySQL 5.6.2

The default storage engine is InnoDB, or is it not?

When MySQL 5.5 went GA, the biggest piece of news was that the default storage engine is now InnoDB. Good news, and hope for a better future, as InnoDB is the most reliable storage engine available for MySQL.

Therefore the expectation is that MySQL 5.6 follows in its steps, and we should see less and less of MyISAM in the database.
The privileges tables, however, are still MyISAM. I was not really expecting to see them disappear so quickly, as I have seen how much work it has been for Drizzle to get rid of them, and even them had to keep MyISAM alive for temporary tables.
However, I was surprised to see that the new tables for replication metadata, the ones that replace the files and are MyISAM by default.
The manual says:
In order for replication to be crash-safe, the slave_master_info and slave_relay_log_info tables must each use a transactional storage engine. By default, both of these tables use MyISAM; this means that, prior to starting replication, you must change both of these tables to use a transaction storage engine if you wish for replication to be crash-safe. You can do this by means of the appropriate ALTER TABLE ... ENGINE=... statements. You should not attempt to change the storage engine used by either of these tables while replication is actually running.

The funny thing is that the manual does not mention InnoDB explicitly, as if there were many transactional engines coming with the official MySQL.

Tables instead of files.

Anyway, I went ahead and tried the new metadata features. In short, the new version allows you to store the data that once were in and in two tables with similar names, located under the mysql schema.

First of all, I changed the storage engine, as suggested by the docs. Actually, the docs are still a bit scarce about this feature. The best instructions are the ones found in Mats Kindahl.
# in the slave
ALTER TABLE mysql.slave_master_info ENGINE = InnoDB;
ALTER TABLE mysql.slave_relay_log_info ENGINE = InnoDB;

# in the slave configuration file
After this operation, I initialized the two slaves, one of which has the new table info, and the other one has still the old files info, for comparison.

# slave 1 (with table info)
show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_User: rsandbox
                  Master_Port: 12027
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 114
               Relay_Log_File: mysql_sandbox12028-relay-bin.000002
                Relay_Log_Pos: 267
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

select * from slave_master_info\G
*************************** 1. row ***************************
              Master_id: 101
        Number_of_lines: 20
         Master_log_pos: 4
                   Port: 3306
          Connect_retry: 60
            Enabled_ssl: 0
Ssl_verify_servert_cert: 0
              Heartbeat: 1800
     Ignored_server_ids: 0
            Retry_count: 86400

# Slave 2
cat node2/data/ 



Hmmm. Not good. Definitely not good.
Now, according to Mats article, the slave_master_info table is updated every time a slave starts. But this is not the case.
Apparently, you need to restart it at least once more, to get an update.
# slave 1
stop slave; 
start slave;
select * from slave_master_info\G
*************************** 1. row ***************************
              Master_id: 101
        Number_of_lines: 20
        Master_log_name: mysql-bin.000001
         Master_log_pos: 114
              User_name: rsandbox
          User_password: rsandbox
                   Port: 12027
          Connect_retry: 60
            Enabled_ssl: 0
Ssl_verify_servert_cert: 0
              Heartbeat: 1800
     Ignored_server_ids: 0
                   Uuid: 6cb60e24-68e7-11e0-9eec-6c626da07446
            Retry_count: 86400

This lack of update is the default by design. The reasoning is that if you update the table at every transaction, you are slowing down replication beyond acceptable levels. However, it must be noted that the update of the table is way less than the updates of the file.

You can force the slave_master_info and slave_relay_log_info tables to update at every transaction, by setting sync_master_info and sync_relay_log_info. Indeed, with this addition, the table is updated at every transaction.
Therefore the choice is between crash unsafe and fast (with the *.info files) and crash safe and very slow (with the tables).

Usability issues

Let's mention the good news first. This addition is very welcome, because it allows monitoring tools to be implemented directly in SQL. The main difficulty about this problem is that the only metadata available until MySQL 5.5 is "SHOW SLAVE STATUS", which has no related Information_Schema or Performance_Schema table. Thus, getting the status values into a SQL variables is not feasible without external tools. This metadata is no replacement for SHOW SLAVE STATUS (that worklog is still struggling with a slow implementation) but there is enough overlapping that a simple monitoring tool could be created with SQL, stored routines and the event scheduler.

Now, for the bad news:
This implementation leaves me baffled for several reasons.
The lack of updates by default is the biggest concern. There is no option of automatic updates every second, same as synch_binlog. It's all or nothing.

The choice of implementation is not pleasant either. Users would expect the table-based recording to mimic the behavior of the file-based recording, i.e. when replication is started, the table is created, and after a "reset slave' the table is removed. But this does not happen. The table is truncated, and if you remove it, it won't be created when you restart replication.

What's worse, this table can't be dumped with locks. MySQL complains if you attempt to do that.
./s1 -e 'stop slave'

mysqldump mysql slave_master_info
-- MySQL dump 10.13  Distrib 5.6.2-m5, for linux2.6 (x86_64)
-- Host: localhost    Database: mysql
-- ------------------------------------------------------
-- Server version 5.6.2-m5-log

/*!40101 SET NAMES utf8 */;
/*!40103 SET TIME_ZONE='+00:00' */;
mysqldump: Got error: 1721: You can't use locks with rpl info tables. when doing LOCK TABLES
This makes more difficult the operation of provisioning a slave from a backup. I would expect that, having stopped the slave, I could backup the table, possibly together with the rest of the database. Maybe MySQL has a hidden clever way of exporting this data, but if that exists, so far it has escaped me.

Update: You can use the following command to dump this table.
mysqldump --master-data mysql slave_master_info
However, a simple
mysqldump --master-data mysql
does not include the *_info tables. (Bug#60902)

Another problem is maintenance. If I want to clean up the InnoDB table space, the usual recipe is to dump everything, stop the server, remove the ib* files, restart the server, and then reload the data.
That has worked very well so far, because there were no innodb tables in the mysql database. Now, however, if we attempt to perform the above operation, we get an error when InnoDB comes online, because it won't find an internal reference to the innodb tables, whose .frm files (and possibly .ibd files) are still dangling around under the mysql folder.

Incidentally, I can note that Tungsten Replicator uses a similar approach (replication metadata is stored in a table, which is updated at every commit), and yet it does not suffer from any of the drawbacks mentioned here. The replication metadata tables are stored in a regular schema, which can be dumped just fine to provision a new slave. The additional commits are not a problem, since Tungsten uses the technique of block commits, where it commits together all the transactions that can be safely grouped.
The safety of the slave thus depends on the value of --innodb-flush-log-at-trx-commit, not on additional trade off decisions.

More to come.

This article covers just a portions of the replication improvements in 5.6. There is much more to see and test.
Specifically, I want to test the performance impact of the metadata tables, and also the performance of the multi-threaded slave prototype against regular replication and Tungsten. I will get around to it shortly. Stay tuned.


Mark Callaghan said...

What is Oracle is deferring by default to make things faster? If the slave commits replication changes and then less frequently updates the slave state table, then that design is no better than using

There are two ways to make slave state tables faster. First, use commit less frequently (I think Tungsten calls this block commit). If there are 5 auto-commit statements then the slave can execute them as if they all in one transaction and do one commit at the end.

The other optimization is to run with innodb-flush-log-at-trx-commit=2 to let InnoDB make commit fast. As long as changes to InnoDB and the state of changes to InnoDB are recovered to the same point in time, then nothing is lost and nothing is duplicated.

Jon Stephens said...

The funny thing is that the manual does not mention InnoDB explicitly, as if there were many transactional engines coming with the official MySQL.

The statement you refer to is correct as written, and is written to be future-proof. InnoDB itself is not a requirement; a transactional storage engine is. Yes, for now, "transactional" basically means "InnoDB", but we (or someone else) might in future offer other possibilities. If that should happen, I don't want to be obligated to go back and "fix" this everywhere it occurs in several versions of the Manual.

(P.S. Can you please add something to your CSS so that QUOTE tag content is distinguishable from the rest? Or allow BLOCKQUOTE? Thanks!)

Giuseppe Maxia said...

I understand that the wording is future oriented. Nonetheless, I find it funny. It was not a reproach to the docs team, whose work I admire, as you know very well, but an expression of my particular sense of humor, which you are free not to share.
As for CSS formatting, I have no control on it. It comes from Google (Blogger). The original post has a BLOCKQUOTE tag.

Daniƫl van Eeden said...

There is a transactional table engine besides InnoDB: NDB. With the new user credential shareing it's already used for some system tables.

Giuseppe Maxia said...

NDB is only for MySQL Cluster. The normal MySQL Server has no use for it.
Besides, MySQL server and MySQL Cluster have been following different paths. Although you may find NDB related files in the server distribution, you should avoid them for regular usage.

Jon Stephens said...

1. There's currently no support for NDB in MySQL 5.6.


3. Not allowed to use BLOCKQUOOTE in comments; QUOTE is permitted, though.