Wednesday, August 12, 2015

MySQL replication in action - Part 1: GTID & Co


In the theoretical part of this series, we have seen the basics of monitoring. In that article, though, we have barely mentioned the new tools available in MySQL 5.7 and MariaDB 10. Let’s start from something that has the potential of dramatically changing replication as we know it.

Crash-safe tables and Global transaction identifiers in MySQL 5.6 and 5.7

Global transaction identifiers (GTID) is a feature that has been in my wish list for long time, since the times I was working with the MySQL team. By the time I left Oracle, this feature was not even in the plans.
When MySQL 5.6 was first disclosed, the biggest improvement for replication was the introduction of crash-safe tables (see Status persistence in Monitoring 101.) There are two tables in the mysql database, named slave_master_info and slave_relay_log_info. At the beginning, these tables were using the MyISAM engine, thus defeating the purpose of making them crash-safe. In later versions, the developers decided to bite the bullet and create these tables with innodb from the beginning.
These two tables allow us to see the same information previously stored in the files master.info and relay_log.info. What makes these tables convenient is that they should survive a crash better than the standalone files.

The idea is good, but the implementation could be better. The new tables are disabled by default. To use them, you need to set a couple of dynamic variables,
relay-log-info-repository=table  
master-info-repository=table
Here is an example of what these tables look like
slave1 [localhost] {msandbox} (mysql) > select * from slave_master_info\G  
*************************** 1. row ***************************  
       Number_of_lines: 23  
       Master_log_name: mysql-bin.000002  
        Master_log_pos: 151  
                  Host: 127.0.0.1  
             User_name: rsandbox  
         User_password: rsandbox  
                  Port: 21891  
         Connect_retry: 60  
           Enabled_ssl: 0  
                Ssl_ca:  
            Ssl_capath:  
              Ssl_cert:  
            Ssl_cipher:  
               Ssl_key:  
Ssl_verify_server_cert: 0  
             Heartbeat: 1800  
                  Bind:  
    Ignored_server_ids: 0  
                  Uuid: 27971ecc-36e8-11e5-b390-2ff12c09a72a  
           Retry_count: 86400  
               Ssl_crl:  
           Ssl_crlpath:  
 Enabled_auto_position: 0  
1 row in set (0.00 sec)

slave1 [localhost] {msandbox} (mysql) > select * from slave_relay_log_info\G  
*************************** 1. row ***************************  
  Number_of_lines: 7  
   Relay_log_name: ./mysql_sandbox21892-relay-bin.000005  
    Relay_log_pos: 907  
  Master_log_name: mysql-bin.000002  
   Master_log_pos: 697  
        Sql_delay: 0  
Number_of_workers: 0  
               Id: 1  
1 row in set (0.00 sec)
The information looks like what we used to get from the .info files. There is, however, a notable difference. Look at what SHOW SLAVE STATUS says about the same situation:
slave1 [localhost] {msandbox} ((none)) > show slave status\G  
*************************** 1. row ***************************  
               Slave_IO_State: Waiting for master to send event  
                  Master_Host: 127.0.0.1  
                  Master_User: rsandbox  
                  Master_Port: 21891  
                Connect_Retry: 60  
              Master_Log_File: mysql-bin.000002  
          Read_Master_Log_Pos: 697  
               Relay_Log_File: mysql_sandbox21892-relay-bin.000005  
                Relay_Log_Pos: 907  
        Relay_Master_Log_File: mysql-bin.000002  
             Slave_IO_Running: Yes  
            Slave_SQL_Running: Yes
The value of Read_Master_Log_Pos is different. SHOW SLAVE STATUS says 697, while mysql.slave_master_info reports an older position: 151.
The reason for this discrepancy is that, by default, the table is updated every 10,000 events, while the slave_relay_log_info table is updated at every event. This means that, in case of crash, only one table is guaranteed to hold reliable information. It should be enough for a recovery, at least until someone finds a creative way of crashing the server in a way that requires the updated contents of slave_master_info.
Shortly after the crash-safe tables, a new feature was released as a preview, and later included in the main build: global transaction identifiers, or GTID. While I am glad that the feature was added, I am not pleased with the way it is implemented. Let’s see how it works.
You may have noticed in one of the listings above a field named Uuid, containing a long value: 27971ecc–36e8–11e5-b390–2ff12c09a72a. This long string of hexadecimal digits is the identifier of the server. The good thing about this identifier is that it is guaranteed to be unique. Unlike the server-id, which is a 64bit integer generated by users, this one is created during the server initialization, and you should be reasonably sure that no two servers have the same identifier. The bad thing is that these identifiers are unreadable and unpronounceable by humans. Try it:
“Hey, Sam! can you check if 27971ecc–36e8–11e5-b390–2ff12c09a72a is replicating to 30589f86–36e8–11e5-b390–0b61c3af229e?”
Pretty tough, eh? But unfortunately, this is how GTID in MySQL 5.6 and 5.7 are implemented. When they are enabled, you will see in the binary logs remarks such as this one:
SET @@SESSION.GTID_NEXT= '27971ecc-36e8-11e5-b390-2ff12c09a72a:2'/*!*/;
That’s your GTID. If you are using a simple master/slave deployment, you can just ignore the long string and concentrate on the second element (here the number “2”) which is the sequence number of the event. Things get hairier when we deal with multiple sources. We’ll see that in one of the next articles.
For now, it will suffice to notice that we will find the same string both in the master binary log and in the slaves relay log, regardless of how different the file names and positions are in the various servers. This fact allows us to easily find a specific event in any server belonging to the same replication domain, which is the main purpose of having a GTID.
Le’s have a look at a complete result from SHOW SLAVE STATUS, which now includes GTID information.
slave1 [localhost] {msandbox} ((none)) > show slave status\G  
*************************** 1. row ***************************  
               Slave_IO_State: Waiting for master to send event  
                  Master_Host: 127.0.0.1  
                  Master_User: rsandbox  
                  Master_Port: 21891  
                Connect_Retry: 60  
              Master_Log_File: mysql-bin.000002  
          Read_Master_Log_Pos: 697  
               Relay_Log_File: mysql_sandbox21892-relay-bin.000005  
                Relay_Log_Pos: 907  
        Relay_Master_Log_File: mysql-bin.000002  
             Slave_IO_Running: Yes  
            Slave_SQL_Running: Yes  
              Replicate_Do_DB:  
          Replicate_Ignore_DB:  
           Replicate_Do_Table:  
       Replicate_Ignore_Table:  
      Replicate_Wild_Do_Table:  
  Replicate_Wild_Ignore_Table:  
                   Last_Errno: 0  
                   Last_Error:  
                 Skip_Counter: 0  
          Exec_Master_Log_Pos: 697  
              Relay_Log_Space: 1287  
              Until_Condition: None  
               Until_Log_File:  
                Until_Log_Pos: 0  
           Master_SSL_Allowed: No  
           Master_SSL_CA_File:  
           Master_SSL_CA_Path:  
              Master_SSL_Cert:  
            Master_SSL_Cipher:  
               Master_SSL_Key:  
        Seconds_Behind_Master: 0  
Master_SSL_Verify_Server_Cert: No  
                Last_IO_Errno: 0  
                Last_IO_Error:  
               Last_SQL_Errno: 0  
               Last_SQL_Error:  
  Replicate_Ignore_Server_Ids:  
             Master_Server_Id: 1  
                  Master_UUID: 27971ecc-36e8-11e5-b390-2ff12c09a72a  
             Master_Info_File: mysql.slave_master_info  
                   SQL_Delay: 0  
          SQL_Remaining_Delay: NULL  
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it  
           Master_Retry_Count: 86400  
                  Master_Bind:  
      Last_IO_Error_Timestamp:  
     Last_SQL_Error_Timestamp:  
               Master_SSL_Crl:  
           Master_SSL_Crlpath:  
           Retrieved_Gtid_Set: 27971ecc-36e8-11e5-b390-2ff12c09a72a:1-2  
            Executed_Gtid_Set: 27971ecc-36e8-11e5-b390-2ff12c09a72a:1-2  
                Auto_Position: 0  
1 row in set (0.00 sec)                 
Almost everything up to master server ID looks the same as in previous versions. Then we get that long identifier, a note that we are using the (scarcely updated) mysql.slave_master_info table, and at the very end the information about the latest GTIDs that were processed.
What we have seen so far is enough for being upset at the GTID implementation, but there is more:
  • To enable GTIDs, you need log-slave-updated in all nodes involved in replication. The reason is that you want a slave to be ready to become a master and vice versa, but this imposition may be expensive. This requirement has been lifter in MySQL 5.7.
  • The crash-safe tables do not include GTID values. You can get GTID information from the binary logs, or from SHOW SLAVE STATUS, and in MySQL 5.7 also from a few performance_schema tables. But there is no place except SHOW SLAVE STATUS where you get at once the GTID and the corresponding binary log and position. Sure, there are tools that can do this for you, but it feels as if something is missing.
  • GTID with CHANGE MASTER TO is an all-or-nothing proposition. With GTID, you can either use MASTER_AUTO_POSITION=1 and let master and slave sync each other, or you get the default behavior (replicating from the earliest binlog available). There is no such a ting as "start from GTID #". If you don't want to start replication from the automatic position or from the beginning, you still need to use binary log name and position.
  • When you need to skip one or more transactions in the slave, the only available method is creating as many empty transactions as you want to ignore.
  • There are statements that are not accepted when GTID is enabled. And not accepted does not mean that they are not replicated. It means that you can’t enter these statements in the master: they will be rejected with an error:
    • updates involving transactional and non transactional tables; (note that events that only affect non-transactional tables, such as MyISAM, are accepted)
    • CREATE TABLE … SELECT statements.
    • Temporary tables within transactions.
For example:
create table dummy select * from t2;  
ERROR 1786 (HY000): CREATE TABLE ... SELECT is forbidden when @@GLOBAL.ENFORCE_GTID_CONSISTENCY = 1.

More monitoring in MySQL 5.7

In MySQL 5.7, the performance_schema has acquired many new tables, some of which are dedicated to replication.
slave1 [localhost] {msandbox} (performance_schema) > show tables like 'repl%';  
+-------------------------------------------+  
| Tables_in_performance_schema (repl%)      |  
+-------------------------------------------+  
| replication_applier_configuration         |  
| replication_applier_status                |  
| replication_applier_status_by_coordinator |  
| replication_applier_status_by_worker      |  
| replication_connection_configuration      |  
| replication_connection_status             |  
| replication_group_member_stats            |  
| replication_group_members                 |  
+-------------------------------------------+  
8 rows in set (0.00 sec)
Looking at these tables, we realize that mostly they have converted into table some of the contents of SHOW SLAVE STATUS.
select * from replication_connection_configuration\G  
*************************** 1. row ***************************  
                 CHANNEL_NAME:  
                         HOST: 127.0.0.1  
                         PORT: 13052  
                         USER: rsandbox  
            NETWORK_INTERFACE:  
                AUTO_POSITION: 0  
                  SSL_ALLOWED: NO  
                  SSL_CA_FILE:  
                  SSL_CA_PATH:  
              SSL_CERTIFICATE:  
                   SSL_CIPHER:  
                      SSL_KEY:  
SSL_VERIFY_SERVER_CERTIFICATE: NO  
                 SSL_CRL_FILE:  
                 SSL_CRL_PATH:  
    CONNECTION_RETRY_INTERVAL: 60  
       CONNECTION_RETRY_COUNT: 86400  
           HEARTBEAT_INTERVAL: 30.000

select * from replication_connection_status\G  
*************************** 1. row ***************************  
             CHANNEL_NAME:  
               GROUP_NAME:  
              SOURCE_UUID: 2bfac0c8-36f6-11e5-abc9-b3bc91a587b3  
                THREAD_ID: 23  
            SERVICE_STATE: ON  
COUNT_RECEIVED_HEARTBEATS: 190  
 LAST_HEARTBEAT_TIMESTAMP: 2015-07-30 22:18:45  
 RECEIVED_TRANSACTION_SET: 2bfac0c8-36f6-11e5-abc9-b3bc91a587b3:1-207  
        LAST_ERROR_NUMBER: 0  
       LAST_ERROR_MESSAGE:  
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
This latest table is bizarrely named. It’s named “connection status”, but it is the only table where there is running information about the replication process. It should be called with something that reminds applier progress. Anyway, here is the only point in these tables where we get information about GTID. However, we only get information about GTIDs that were “received.” There is no indication about the ones being executed. Recall what we have seen in the first article about the applier work, and compare with the information available in SHOW SLAVE STATUS and the mysql.slave_* tables: in the old info, we get the master position (original events), the position of the data in the relay logs (events transferred to the slave) and the position of the execution (events actually applied to the database.)
If you want to know the latest GTID that was executed, you need to run SELECT @@global.gtid_executed or to run SHOW SLAVE STATUS, where both pieces of information are shown together.
There is another table that often is useful for monitoring: replication_applier_status_by_coordinator will have the error code and message when replication breaks.
To complement the information in performance_schema, there is another table in the mysql database, named gtid_executed. This table is filled with the GTIDs that were executed in the server, but only when some events occur (e.g. flush logs) or when the slave does not have binary logging enabled.

What’s still missing

The replication_* tables in performance_schema have only information related to the slave operations. However, as we have seen in the previous article, we can’t use only one sided information to monitor replication. We need the master status. Which, as of today, is still only a “SHOW” command. There is a status variable that tells the status of the GTID, but that’s all.
master [localhost] {msandbox} ((none)) > show master status\G  
*************************** 1. row ***************************  
             File: mysql-bin.000002  
         Position: 681  
     Binlog_Do_DB:  
 Binlog_Ignore_DB:  
Executed_Gtid_Set: 2bfac0c8-36f6-11e5-abc9-b3bc91a587b3:1-2  
1 row in set (0.00 sec)

master [localhost] {msandbox} ((none)) > select  @@global.gtid_executed;  
+------------------------------------------+  
| @@global.gtid_executed                   |  
+------------------------------------------+  
| 2bfac0c8-36f6-11e5-abc9-b3bc91a587b3:1-2 |  
+------------------------------------------+  
1 row in set (0.00 sec)
The reason I am so hot about having the monitoring information in tables instead of SHOW commands is because this makes monitoring only available through external tools. Having all information in tables would allow us to run monitoring of the whole replication in SQL, as was demonstrated in a prototype a few years ago.

MariaDB 10 GTID and crash-safe tables

Compared to what we have seen in MySQL 5.6 and 5.7, MariaDB implementation of GTID is rather minimalistic. Here are the basic facts:
  • GTIDs are active by default. No need to enable them. You will get them out of the box.
  • There are no known limitations. All commands are allowed.
  • The data origin is identified by a group of thee integers: the domain, the server, and the sequence. (We will see the domain in action when we examine multi-source replication.)
  • The slave crash-safe table is quite simple, compared to MySQL.
In the master, you can see the GTID in a variable:
master [localhost] {msandbox} (test) > select @@gtid_current_pos;  
+--------------------+  
| @@gtid_current_pos |  
+--------------------+  
| 0-1-17             |  
+--------------------+  
1 row in set (0.00 sec)
And in the slave the latest GTIDs are stored in a table.
slave1 [localhost] {msandbox} (mysql) > select * from gtid_slave_pos;  
+-----------+--------+-----------+--------+  
| domain_id | sub_id | server_id | seq_no |  
+-----------+--------+-----------+--------+  
|         0 |     16 |         1 |     16 |  
|         0 |     17 |         1 |     17 |  
+-----------+--------+-----------+--------+  
2 rows in set (0.00 sec)
You can compare information in master and slave using global variables
master [localhost] {msandbox} (test) > show global variables like '%gtid%';  
+------------------------+--------+  
| Variable_name          | Value  |  
+------------------------+--------+  
| gtid_binlog_pos        | 0-1-17 |  
| gtid_binlog_state      | 0-1-17 |  
| gtid_current_pos       | 0-1-17 |  
| gtid_domain_id         | 0      |  
| gtid_ignore_duplicates | OFF    |  
| gtid_slave_pos         |        |  
| gtid_strict_mode       | OFF    |  
+------------------------+--------+

slave1 [localhost] {msandbox} (mysql) > show global variables like '%gtid%';  
+------------------------+--------+  
| Variable_name          | Value  |  
+------------------------+--------+  
| gtid_binlog_pos        |        |  
| gtid_binlog_state      |        |  
| gtid_current_pos       | 0-1-17 |  
| gtid_domain_id         | 0      |  
| gtid_ignore_duplicates | OFF    |  
| gtid_slave_pos         | 0-1-17 |  
| gtid_strict_mode       | OFF    |  
+------------------------+--------+
Notice that the master has more information than the slave, because it has data about its binary log, which the slave does not need to have, since log-slave-updates is not a requirement.
And what you see in the binary logs is quite straightforward, i.e. it is human-readable:
#150730 23:12:33 server id 1  end_log_pos 3595  GTID 0-1-17
(As we have noted for MySQL 5.6/5.7, also in MariaDB 10 the GTID is seen in both the master binary log and the slaves relay logs.)
There are no other tables related to replication in information_schema or performance_schema. The old information (binary log + position) is not recorded anywhere. The design decision, in this case, was to use GTID information only. It is possible to set up and manage replication with only GTIDs.
I have mixed feelings about this implementation. On one hand, it is cleaner and better integrated with the rest of the database than MySQL 5.6 solution. On the other hand, the minimalistic approach has sacrificed the completeness of information (see below for an example of this.) Furthermore, it breaks compatibility with MySQL so drastically, that two products cannot work together (well, not exactly: you can use Tungsten Replicator to replicate with mixed nodes, but that’s another story.) Non only you can’t mix MySQL and MariaDB 10 nodes in replication, but the MariaDB project cannot easily integrate many improvements introduced in the performance_schema by version 5.7.

In the trenches with GTID

In a nutshell, the main problem that GTID solves is to identify transactions in situations where there is a discrepancy between data received and applied in various slaves. To see how GTIDs can help in cases where we have a high load and slaves updated at different paces, let’s simulate slave lagging by turning off the SQL thread while we pump a few million rows inside the database.

First, in MySQL 5.7.

We will test using MySQL::Sandbox.
$ make_replication_sandbox 5.7.8  
installing and starting master  
installing slave 1  
installing slave 2  
starting slave 1  
. sandbox server started  
starting slave 2  
. sandbox server started  
initializing slave 1  
initializing slave 2  
replication directory installed in $HOME/sandboxes/rsandbox_5_7_8
We have one master and two slaves in replication. However, as we mentioned before, GTID is not enabled by default. For this reason, MySQL::Sandbox creates a file that runs the commands needed to use GTID in all nodes. $ cd $HOME/sandboxes/rsandbox_5_7_8 $ ./enable_gtid
Inside enable_gtid there is an example of a sweet point for GTID. In traditional replication, when you connected a slave, you had to indicate binary log and position, or the slave would replicate from the first position of the first binary log. With GTID, you don’t need this. You can instead say:
CHANGE MASTER TO MASTER_AUTO_POSITION=1
This tells the master that it will synchronize with the slave using GTIDs.
Let’s start our experiment. In both slaves, we run
STOP SLAVE SQL_THREAD;
And then we start inserting data. (Note: the relatively low number of GTID does not mean that we are inserting just a few hundred rows. We’re using the sample employees database which has multiple-row inserts, for a grand total of about 4.5 million rows.)
slave1 [localhost] {msandbox} (performance_schema) > SHOW SLAVE STATUS\G  
*************************** 1. row ***************************  
               Slave_IO_State: Waiting for master to send event  
                  Master_Host: 127.0.0.1  
                  Master_User: rsandbox  
                  Master_Port: 13253  
                Connect_Retry: 60  
              Master_Log_File: mysql-bin.000003  
          Read_Master_Log_Pos: 66374840  
               Relay_Log_File: mysql-relay.000002  
                Relay_Log_Pos: 1559  
        Relay_Master_Log_File: mysql-bin.000002  
             Slave_IO_Running: Yes  
            Slave_SQL_Running: No  
              Replicate_Do_DB:  
          Replicate_Ignore_DB:  
           Replicate_Do_Table:  
       Replicate_Ignore_Table:  
      Replicate_Wild_Do_Table:  
  Replicate_Wild_Ignore_Table:  
                   Last_Errno: 0  
                   Last_Error:  
                 Skip_Counter: 0  
          Exec_Master_Log_Pos: 1346  
              Relay_Log_Space: 66380644  
              Until_Condition: None  
               Until_Log_File:  
                Until_Log_Pos: 0  
           Master_SSL_Allowed: No  
           Master_SSL_CA_File:  
           Master_SSL_CA_Path:  
              Master_SSL_Cert:  
            Master_SSL_Cipher:  
               Master_SSL_Key:  
        Seconds_Behind_Master: NULL  
Master_SSL_Verify_Server_Cert: No  
                Last_IO_Errno: 0  
                Last_IO_Error:  
               Last_SQL_Errno: 0  
               Last_SQL_Error:  
  Replicate_Ignore_Server_Ids:  
             Master_Server_Id: 1  
                  Master_UUID: f34639b4-3951-11e5-9fe2-b8aeed734276  
             Master_Info_File: mysql.slave_master_info  
                    SQL_Delay: 0  
          SQL_Remaining_Delay: NULL  
      Slave_SQL_Running_State:  
           Master_Retry_Count: 86400  
                  Master_Bind:  
      Last_IO_Error_Timestamp:  
     Last_SQL_Error_Timestamp:  
               Master_SSL_Crl:  
           Master_SSL_Crlpath:  
           Retrieved_Gtid_Set: f34639b4-3951-11e5-9fe2-b8aeed734276:1-182  
            Executed_Gtid_Set: f34639b4-3951-11e5-9fe2-b8aeed734276:1-5  
                Auto_Position: 1  
         Replicate_Rewrite_DB:  
                 Channel_Name:  
1 row in set (0.00 sec)
Looking at SHOW SLAVE STATUS, we see that we get information about the data being transferred to the slave and what was executed. Here, we have a big gap, because the SQL_THREAD is idle. Anyway, SHOW SLAVE STATUS tells us both what the gap is in the relay logs, and the gap in the GTID (Retrieved_Gtid_Set vs. Executed_Gtid_Set.) Here it is clear what has been transferred to the slave and what has been executed. We can get the Executed_Gtid_Set by running SELECT @@global.gtid_executed.
Now, let’s look at the crash-safe tables. We have already said that the table slave_master_info will only be updated every 10,000 events, so we skip it. We hope that the other one should give us more updated info.
slave1 [localhost] {msandbox} (mysql) > select * from slave_relay_log_info\G  
*************************** 1. row ***************************  
  Number_of_lines: 7  
   Relay_log_name: ./mysql-relay.000002  
    Relay_log_pos: 1559  
  Master_log_name: mysql-bin.000002  
   Master_log_pos: 1346  
        Sql_delay: 0  
Number_of_workers: 0  
               Id: 1  
     Channel_name:  
1 row in set (0.00 sec)
Well, no. It only gives us the initial position. This table apparently monitors the SQL_THREAD, not the IO_THREAD. Finally, we have a look at the performance_schema:
slave1 [localhost] {msandbox} (performance_schema) > select * from replication_connection_status\G  
*************************** 1. row ***************************  
             CHANNEL_NAME:  
               GROUP_NAME:  
              SOURCE_UUID: f34639b4-3951-11e5-9fe2-b8aeed734276  
                THREAD_ID: 33  
            SERVICE_STATE: ON  
COUNT_RECEIVED_HEARTBEATS: 26  
 LAST_HEARTBEAT_TIMESTAMP: 2015-08-02 22:32:58  
 RECEIVED_TRANSACTION_SET: f34639b4-3951-11e5-9fe2-b8aeed734276:1-182  
        LAST_ERROR_NUMBER: 0  
       LAST_ERROR_MESSAGE:  
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00  
1 row in set (0.00 sec)
According to this table, we are dealing with GTID # 182, which, as we know from looking at SHOW SLAVE STATUS, is what we have received, but not applied.
Now, let’s restart the SQL_THREAD, and see what happens:
  • the table mysql.slave_relay_log_info is now updated with every event that gets applied. Here we see relay log and binary log advance, but not GTID.
  • We can also see GTID progress in isolation by checking SELECT @@global.gtid_executed.
  • performance_schema.replication_connection_status does not update anymore, although the SQL_THREAD now is working furiously.
  • SHOW SLAVE STATUS keeps giving us correct upgrades. This is the only place where we see together GTIDs and binary + relay logs.
My final take: why do we have half a dozen tables that give me bits and pieces, instead of having just one that gives us what we all need, i.e. the contents of SHOW SLAVE STATUS?

Next, in MariaDB 10.

We use the same setup used for MySQL 5.7. One big difference is that we don’t need to enable GTIDs, so our only concern is to use the proper option when starting replication:
CHANGE MASTER TO ...  master_use_gtid=current_pos
As we did with MySQL 5.7, we stop the SQL_THREAD, and pump some data.
slave1 [localhost] {msandbox} (mysql) > SHOW SLAVE STATUS\G  
*************************** 1. row ***************************  
               Slave_IO_State: Waiting for master to send event  
                  Master_Host: 127.0.0.1  
                  Master_User: rsandbox  
                  Master_Port: 25030  
                Connect_Retry: 60  
              Master_Log_File: mysql-bin.000002  
          Read_Master_Log_Pos: 168389219  
               Relay_Log_File: mysql-relay.000002  
                Relay_Log_Pos: 3002  
        Relay_Master_Log_File: mysql-bin.000001  
             Slave_IO_Running: Yes  
            Slave_SQL_Running: No  
              Replicate_Do_DB:  
          Replicate_Ignore_DB:  
           Replicate_Do_Table:  
       Replicate_Ignore_Table:  
      Replicate_Wild_Do_Table:  
  Replicate_Wild_Ignore_Table:  
                   Last_Errno: 0  
                   Last_Error:  
                 Skip_Counter: 0  
          Exec_Master_Log_Pos: 2715  
              Relay_Log_Space: 168396297  
              Until_Condition: None  
               Until_Log_File:  
                Until_Log_Pos: 0  
           Master_SSL_Allowed: No  
           Master_SSL_CA_File:  
           Master_SSL_CA_Path:  
              Master_SSL_Cert:  
            Master_SSL_Cipher:  
               Master_SSL_Key:  
        Seconds_Behind_Master: NULL  
Master_SSL_Verify_Server_Cert: No  
                Last_IO_Errno: 0  
                Last_IO_Error:  
               Last_SQL_Errno: 0  
               Last_SQL_Error:  
  Replicate_Ignore_Server_Ids:  
             Master_Server_Id: 1  
               Master_SSL_Crl:  
           Master_SSL_Crlpath:  
                   Using_Gtid: Current_Pos  
                  Gtid_IO_Pos: 0-1-189  
1 row in set (0.00 sec)
We can see the GTID growing, and the corresponding relay log. At first sight, there is less information than MySQL 5.7. The data about GTID is only what we receive, not what we have executed.
Similarly to MySQL 5.7, the crash-safe table is idle, because it reports the information about executed GTIDs.
slave1 [localhost] {msandbox} (mysql) > select * from gtid_slave_pos;  
+-----------+--------+-----------+--------+  
| domain_id | sub_id | server_id | seq_no |  
+-----------+--------+-----------+--------+  
|         0 |     11 |         1 |     11 |  
|         0 |     12 |         1 |     12 |  
+-----------+--------+-----------+--------+
And we can see the GTID that was executed by querying the variable select @@global.gtid_current_pos.

Summing up

The sunny side is that we have two database servers that can use GTID information. This is great news whenever you need to perform a failover and the old master has gone away. The old problem of synchronizing the remaining slaves becomes trivial. Both implementations make this task easy to automate.
On the darker side, I can only say that I was expecting more. I see lack of integration between GTID and binlog/position in the instrumentation. You can see them together only in SHOW SLAVE STATUS, while the new tables favor the one or the other but not both.
Both implementations share the decision of not producing a table with master status, which makes the job of automated monitoring just a tiny bit more difficult. My main beef about not having a master status table is that it is the last bit of information that is missing to do replication monitoring in pure SQL. Well, sort of. In both flavors you can compare the result of @@global.gtid_executed or @@gtid_current_pos, but it does not give you the precision of the monitoring that you can get using the SHOW statements. Again, the details of what we want to compare are in the previous article. The implementation of GTID lacks some of the rich information that we have when using log files and positions. Someone says we don’t need such information anymore. I disagree. Since replication still happens using binary and relay logs, having a place where GTIDs are related to their physical counterpart can help troubleshooting.

What’s next

We have now seen the main functioning of replication using the latest flavors of MySQL. With that, we are now ready to explore the brand new features, such as multi-source replication (or multi-master, as it is commonly referred to).

2 comments:

Ivan said...

Hi Giuseppe,

Great post, thanks for sharing! Looking forward to reading part 2!
-ivan

Ganbold said...

Hi Giuseppe,

What replication method do you recommend in case of 4 masters?
GTID replication seems only useful with 2 masters.
Galera cluster is not working in FreeBSD and yet Tungsten replicator is written in Java and not an option in our case.

thanks a lot