Thursday, October 08, 2015

Sound advice for GTID, with caveats

During the PerconaLive conference in Amsterdam, I attended a session where I heard a good piece of advice about using GTID. It amounts to: look at SHOW SLAVE STATUS output, and if you see more than one line in the Executed_Gtid_Set field, this tells you immediately if someone has written on a slave database.
This is good advice. Let's dissect it. Here is what a regular slave looks like, when nobody has messed up with it:
SHOW SLAVE STATUS\G
*************************** 1. row ***************************
 [...]
             Master_Server_Id: 1
                  Master_UUID: 00013454-1111-1111-1111-111111111111
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
            Executed_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
1 row in set (0.00 sec)
What you see here is a slave that has received transactions from a single source (Retrieved_Gtid_Set lists only one GTID set) and has applied data from a single source (also Executed_Gtid_Set shows a single item.)
Notice that this advice holds true even when the slave being considered is an intermediate one, i.e. a relay slave which is master of one or more slaves. Due to the nature of GTIDs, even though the intermediate slave is recording the transactions to its own binary log, the transaction identifier does not change. Thus you should see a clean set of transactions throughout the chain. For example, if you have another slave that is replicating from slave #2, you would see something like this:
SHOW SLAVE STATUS\G
*************************** 1. row ***************************
 [...]
             Master_Server_Id: 102
                  Master_UUID: 00013456-3333-3333-3333-333333333333
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
            Executed_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
                Auto_Position: 0
         Replicate_Rewrite_DB:
                 Channel_Name:

Here you see that this slave is not replicating from the original master (Look at Master_server_id and Master_UUID), but the transaction is uniquely marked as being originated from the main master ( UUID 00013454-1111-1111-1111-111111111111.)
To demonstrate the point, we do something that we shouldn't, i.e. write on the slave:
slave1 [localhost] {msandbox} (test) > create table t2 (i int not null primary key);
Query OK, 0 rows affected (0.04 sec)

slave1 [localhost] {msandbox} (test) > insert into t2 values (1);
Query OK, 1 row affected (0.01 sec)
There! We did it. And this is the result:

slave1 [localhost] {msandbox} (test) > SHOW SLAVE STATUS\G
 [...]
             Master_Server_Id: 1
                  Master_UUID: 00013454-1111-1111-1111-111111111111
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12
            Executed_Gtid_Set: 00013454-1111-1111-1111-111111111111:1-12,
00013455-2222-2222-2222-222222222222:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name:
1 row in set (0.00 sec)
As you can see, while Retrieved_Gtid_Set still shows that the slave has received data from a single source, the field Executed_Gtid_Set shows two sets, one belonging to the master (00013454-1111-1111-1111-111111111111:1-12) and one pointing to the slave itself (00013455-2222-2222-2222-222222222222:1-2). (1)
Then, this is true: in master/slave replication, when you see two items in Executed_Gtid_Set, you can be sure that something fishy has happened, and you should investigate.
However (you knew that there was a "however" in this story,) while this advice is perfectly good for MySQL 5.6, it may not be always so in MySQL 5.7.
If you are using multi-source replication, finding multiple items in Executed_Gtid_Set is not only allowed, but expected. Here is what you would see in a 4-node, point-to-point all-masters topology, after each master has created a table and written one record into it:

SHOW SLAVE STATUS\G
*************************** 1. row ***************************
 [...]
             Master_Server_Id: 101
                  Master_UUID: 00008480-1111-1111-1111-111111111111
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 00008480-1111-1111-1111-111111111111:1-3
            Executed_Gtid_Set: 00008480-1111-1111-1111-111111111111:1-3,
00008481-2222-2222-2222-222222222222:1-3,
00008482-3333-3333-3333-333333333333:1-3,
00008483-4444-4444-4444-444444444444:1-3
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node1
*************************** 2. row ***************************
 [...]
             Master_Server_Id: 103
                  Master_UUID: 00008482-3333-3333-3333-333333333333
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 00008482-3333-3333-3333-333333333333:1-3
            Executed_Gtid_Set: 00008480-1111-1111-1111-111111111111:1-3,
00008481-2222-2222-2222-222222222222:1-3,
00008482-3333-3333-3333-333333333333:1-3,
00008483-4444-4444-4444-444444444444:1-3
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node3
*************************** 3. row ***************************
 [...]
             Master_Server_Id: 104
                  Master_UUID: 00008483-4444-4444-4444-444444444444
             Master_Info_File: mysql.slave_master_info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
           Master_Retry_Count: 86400
                  Master_Bind:
      Last_IO_Error_Timestamp:
     Last_SQL_Error_Timestamp:
               Master_SSL_Crl:
           Master_SSL_Crlpath:
           Retrieved_Gtid_Set: 00008483-4444-4444-4444-444444444444:1-3
            Executed_Gtid_Set: 00008480-1111-1111-1111-111111111111:1-3,
00008481-2222-2222-2222-222222222222:1-3,
00008482-3333-3333-3333-333333333333:1-3,
00008483-4444-4444-4444-444444444444:1-3
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node4
You see three transactions per channel because the the sequence of events was
  1. drop table if exists ...
  2. create table ...
  3. insert into ...
This situation seems in contradiction with the advice given for simple master/slave replication, but the truth is that, when multi-source replication was introduced, the simple paradigm of Executed_Gtid_Set as felony detector was broken. In this topology, the current node can be master and slave at once, and thus it is legitimate to see more items in Executed_Gtid_Set. Moreover, the instrumentation for multi-source replication is still incomplete, resulting in Executed_Gtid_Set being overloaded.
The bottom line: GTIDs can help your replication practices, and can also help you detect problems better than the old method of binary logs and positions. But don't get too used to it if you are planning to adopt MySQL 5.7 for more than simple master/slave topologies.

(1) If you are curious about why master and slave have such neat UUIDs, you should have a look at this post about MySQL-Sandbox usability.

2 comments:

Stephane Combaudon said...

Even with 5.6, seeing 2 server_uuids in Executed_Gtid_Set is not necessarily an issue.

Say we have 3 servers: A is the master, B and C are slaves of A. Executed_Gtid_Set will only show 1 server_uuid.


Now let's fail over to B so A and C are slaves. Executed_Gtid_Set will show 2 server_uuids, and you haven't done anything wrong.

Giuseppe Maxia said...

@Stephane,
Indeed. I forgot that you also see different GTID sources when you change master. The same happens if you restart the master with a different server_UUID