Wednesday, September 21, 2016

Improving the design of MySQL replication

Now that MySQL 8.0 has been revealed, it's time to take a deep look at replication features in the latest releases, and review its overall design.

Server UUID vs Server-ID

At the beginning of replication, there was the server_id variable that identified uniquely a node in a replication system. The variable is still here, but in MySQL 5.6 it was joined by another value, which is created during the server initialisation, regardless of its involvement in a replication system. The server_uuid is a string of hexadecimal characters that is the basis for global transaction identifiers:

select @@server_id, @@server_uuid;
+-------------+--------------------------------------+
| @@server_id | @@server_uuid                        |
+-------------+--------------------------------------+
|         101 | 4c0a9670-7f9a-11e6-9f8b-188f83e4eefc |
+-------------+--------------------------------------+

If this server is a master, its slave will report:

        Master_Server_ID: 101 
             Master_UUID: 4c0a9670-7f9a-11e6-9f8b-188f83e4eefc
       [...]
       Retrieved_Gtid_Set: 4c0a9670-7f9a-11e6-9f8b-188f83e4eefc:1-42
        Executed_Gtid_Set: 4c0a9670-7f9a-11e6-9f8b-188f83e4eefc:1-42

The serverid is still indispensable. We can't use the same value for two nodes in the replication system, or we get errors. But instead of using the serverid as the identifier for global transaction identifiers (GTID) we use the serverUUID, thus being subjected to the inhuman treatment of dealing with unreadable values to track our valuable transactions. Yet, server UUIDs have not replaced everything: if we want to exclude one or more servers from being applied, we still refer to them by serverid, as in the IGNORE_SERVER_IDS clause of CHANGE MASTER TO.

How should it be instead

While UUIDs guarantee that identifiers are unique in the the whole planet, the number of nodes in a replication system are finite, and for practical a single system may not go beyond the thousands. I don't care if my serverid is the same as a serverid in another continent: all I want is to be unique within my system. And since we have to use unique values for every node server_id, it is unnecessarily cruel to force users to deal with long strings instead of plain numbers.

VARIABLES vs STATUS

In general, MySQL behavior gets changed by setting variables to a given value either in the options file or using SET GLOBAL dynamically. To see the setting of a given value, we use SHOW VARIABLES:

show variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 15    |
+-----------------+-------+

Here max_connections was changed from its default value to a much lower one. To see how many connections we have consumed, we use SHOW STATUS:

show status like '%connections%';
+-----------------------------------+---------------------+
| Variable_name                     | Value               |
+-----------------------------------+---------------------+
| Connection_errors_max_connections | 0                   |
| Connections                       | 7                   |
| Max_used_connections              | 1                   |
| Max_used_connections_time         | 2016-09-20 18:01:40 |
+-----------------------------------+---------------------+

That's the general theory. Sometimes, in replication, MySQL follows the same path. For example, in semi-synchronous replication, we have the settings of the functionality using variables:

 show variables like 'rpl%';
+-------------------------------------------+------------+
| Variable_name                             | Value      |
+-------------------------------------------+------------+
| rpl_semi_sync_master_enabled              | ON         |
| rpl_semi_sync_master_timeout              | 10000      |
| rpl_semi_sync_master_trace_level          | 32         |
| rpl_semi_sync_master_wait_for_slave_count | 1          |
| rpl_semi_sync_master_wait_no_slave        | ON         |
| rpl_semi_sync_master_wait_point           | AFTER_SYNC |
| rpl_stop_slave_timeout                    | 31536000   |
+-------------------------------------------+------------+

and the running results of how the feature is performing using status:

set status like 'rpl%';
+--------------------------------------------+-------+
| Variable_name                              | Value |
+--------------------------------------------+-------+
| Rpl_semi_sync_master_clients               | 2     |
| Rpl_semi_sync_master_net_avg_wait_time     | 0     |
| Rpl_semi_sync_master_net_wait_time         | 0     |
| Rpl_semi_sync_master_net_waits             | 4     |
| Rpl_semi_sync_master_no_times              | 0     |
| Rpl_semi_sync_master_no_tx                 | 0     |
| Rpl_semi_sync_master_status                | ON    |
| Rpl_semi_sync_master_timefunc_failures     | 0     |
| Rpl_semi_sync_master_tx_avg_wait_time      | 431   |
| Rpl_semi_sync_master_tx_wait_time          | 863   |
| Rpl_semi_sync_master_tx_waits              | 2     |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0     |
| Rpl_semi_sync_master_wait_sessions         | 0     |
| Rpl_semi_sync_master_yes_tx                | 2     |
+--------------------------------------------+-------+

However, when it comes to GTID, the paradigm is broken:

slave1 [localhost] {msandbox} ((none)) > show global VARIABLES like '%gtid%';
+----------------------------------+-------------------------------------------+
| Variable_name                    | Value                                     |
+----------------------------------+-------------------------------------------+
| binlog_gtid_simple_recovery      | ON                                        |
| enforce_gtid_consistency         | ON                                        |
| gtid_executed                    | 00011808-1111-1111-1111-111111111111:1-42 |
| gtid_executed_compression_period | 1000                                      |
| gtid_mode                        | ON                                        |
| gtid_owned                       |                                           |
| gtid_purged                      | 00011808-1111-1111-1111-111111111111:1-42 |
| session_track_gtids              | OFF                                       |
+----------------------------------+-------------------------------------------+
8 rows in set (0.00 sec)

slave1 [localhost] {msandbox} ((none)) > show global STATUS like '%gtid%';
Empty set (0.01 sec)

Here we have variables like gtid_mode and enforce_gtid_consistency that are used to enable the behavior, but the monitoring of the result (gtid_executed and gtid_purged) should have been in STATUS, not in VARIABLES. The same usage of VARIABLES can be seen in MySQL group replication.

How should it be instead

The values that monitor the functionality should be in STATUS rather than in VARIABLES. However, in this case, it would be better if the values were tracked in performance_schema tables, as noted below.

GTID executed

When GTIDs are enabled, we can see its value in SHOW MASTER STATUS

SHOW MASTER STATUS\G
*************************** 1. row ***************************
             File: mysql-bin.000002
         Position: 6325
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
1 row in set (0.00 sec)

The information here has a precise meaning: the transaction set 1-22 comes from a master identified by UUID 00007801-1111-1111-1111-111111111111. The latest transaction corresponds to binary log mysql-bin.000002 at position 6325.

This is the identification of a source, and as such is the basis for monitoring. When we want to check if the slave service is running fine, we need to compare the slave progress against the information reported by the master. So far, so good.

However, when we use multiple masters, the matter becomes less clear. A slave that was replicating from this master and from two more, will show the following:

SHOW MASTER STATUS\G
*************************** 1. row ***************************
             File: mysql-bin.000002
         Position: 154
     Binlog_Do_DB:
 Binlog_Ignore_DB:
Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4

This is completely wrong. What we see here has nothing to do with being a master, as the data shown in the above statement was originated in different servers. The data is also wrong because it is thrown together in the same field. Using the same server, we can see the same information in different ways.

select @@global.gtid_executed\G
*************************** 1. row ***************************
@@global.gtid_executed: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4

This is the same info seen before. In this context, it may seem less wrong, as it is only data coming from other servers.

Here again the information will become much more confused when the slave is also a master and produces its own data sets. If we create something in this server, we end up with the following:

select @@global.gtid_executed\G
*************************** 1. row ***************************
@@global.gtid_executed: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4,
00007804-4444-4444-4444-444444444444:1-2

Now we have in the same field the data that was produced in this server and the data that came through replication. The matter looks even more embarrassing if we look at SHOW SLAVE STATUS, which reports distinct data for every channel, but the executed_gtid_set is reported all mixed up, in every channel, unchanged and confused, master data and slave data alike.

SHOW SLAVE STATUS\G
*************************** 1. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 6325
               Relay_Log_File: mysql-relay-node1.000002
                Relay_Log_Pos: 6538
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 6325
              Relay_Log_Space: 6747
[...]
             Master_Server_Id: 101
                  Master_UUID: 00007801-1111-1111-1111-111111111111
[...]
           Retrieved_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
            Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4,
00007804-4444-4444-4444-444444444444:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node1
           Master_TLS_Version:
*************************** 2. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 6039
               Relay_Log_File: mysql-relay-node2.000002
                Relay_Log_Pos: 6252
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 6039
              Relay_Log_Space: 6461
[...]
             Master_Server_Id: 102
                  Master_UUID: 00007802-2222-2222-2222-222222222222
[...]
           Retrieved_Gtid_Set: 00007802-2222-2222-2222-222222222222:1-21
            Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4,
00007804-4444-4444-4444-444444444444:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node2
           Master_TLS_Version:
*************************** 3. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1177
               Relay_Log_File: mysql-relay-node3.000002
                Relay_Log_Pos: 1390
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 1177
              Relay_Log_Space: 1599
[...]
             Master_Server_Id: 103
                  Master_UUID: 00007803-3333-3333-3333-333333333333
[...]
           Retrieved_Gtid_Set: 00007803-3333-3333-3333-333333333333:1-4
            Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4,
00007804-4444-4444-4444-444444444444:1-2
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node3
           Master_TLS_Version:

How should it be instead

First off, the data should not be in a single variable, because it is composed of multiple values. But even if it were to be kept in the environment (as STATUS items, not variables, as we have seen above,) we should have a distinction:

SHOW STATUS LIKE 'gtid_applied'
*************************** 1. row ***************************
@@global.gtid_executed: 00007801-1111-1111-1111-111111111111:1-22,
00007802-2222-2222-2222-222222222222:1-21,
00007803-3333-3333-3333-333333333333:1-4

SHOW STATUS LIKE 'gtid_generated'
*************************** 1. row ***************************
00007804-4444-4444-4444-444444444444:1-2

The SHOW SLAVE STATUS should report the right line for each channel. For example:

*************************** 1. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 6325
               Relay_Log_File: mysql-relay-node1.000002
                Relay_Log_Pos: 6538
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 6325
              Relay_Log_Space: 6747
[...]
             Master_Server_Id: 101
                  Master_UUID: 00007801-1111-1111-1111-111111111111
[...]
           Retrieved_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
            Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node1
           Master_TLS_Version:
*************************** 2. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 6039
               Relay_Log_File: mysql-relay-node2.000002
                Relay_Log_Pos: 6252
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 6039
              Relay_Log_Space: 6461
[...]
             Master_Server_Id: 102
                  Master_UUID: 00007802-2222-2222-2222-222222222222
[...]
           Retrieved_Gtid_Set: 00007802-2222-2222-2222-222222222222:1-21
            Executed_Gtid_Set: 00007802-2222-2222-2222-222222222222:1-21
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node2
           Master_TLS_Version:
*************************** 3. row ***************************
[...]
              Master_Log_File: mysql-bin.000002
          Read_Master_Log_Pos: 1177
               Relay_Log_File: mysql-relay-node3.000002
                Relay_Log_Pos: 1390
        Relay_Master_Log_File: mysql-bin.000002
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]
          Exec_Master_Log_Pos: 1177
              Relay_Log_Space: 1599
[...]
             Master_Server_Id: 103
                  Master_UUID: 00007803-3333-3333-3333-333333333333
[...]
           Retrieved_Gtid_Set: 00007803-3333-3333-3333-333333333333:1-4
            Executed_Gtid_Set: 00007803-3333-3333-3333-333333333333:1-4
                Auto_Position: 1
         Replicate_Rewrite_DB:
                 Channel_Name: node3
           Master_TLS_Version:

Missing information in performance_schema tables

The idea of having replication information in performance schema is to replace SHOW SLAVE STATUS with a set of tables that report the same info. The theory is good, but the implementation is lacking.

First of all, the tables are misnamed. replication_applier_configuration has only settings about the applier delay, while replication_applier_status again reports only info about the remaining delay. The information worth looking at for monitoring is in a table named replication_connection_status, while it should refer to applier status or applier progress.

select * from replication_connection_status\G
*************************** 1. row ***************************
             CHANNEL_NAME: node1
               GROUP_NAME:
              SOURCE_UUID: 00007801-1111-1111-1111-111111111111
                THREAD_ID: 35
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:41
 RECEIVED_TRANSACTION_SET: 00007801-1111-1111-1111-111111111111:1-22
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
             CHANNEL_NAME: node2
               GROUP_NAME:
              SOURCE_UUID: 00007802-2222-2222-2222-222222222222
                THREAD_ID: 39
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:41
 RECEIVED_TRANSACTION_SET: 00007802-2222-2222-2222-222222222222:1-21
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 3. row ***************************
             CHANNEL_NAME: node3
               GROUP_NAME:
              SOURCE_UUID: 00007803-3333-3333-3333-333333333333
                THREAD_ID: 43
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:42
 RECEIVED_TRANSACTION_SET: 00007803-3333-3333-3333-333333333333:1-4
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00

What is missing here is the applied data, i.e. what appears in @@global.gtid_executed, which, as said above, should be split by server. Moreover, we miss the corresponding binary logs and positions, which we find in some neglected tables in mysql databases and that should be removed, with their contents integrated here. Another thing that's missing are the schema and the timestamps of the latest transaction, both the creation and the replication times. All this information is valuable to the DBA when troubleshooting. Rather than opening binary logs to find out the details, they should be all collected in this table for ease of use.

How should it be instead

The monitoring tables should include information about:

  • transaction timestamp;
  • binary log and position (or what a future implementation would use to transport data);
  • default schema.

Here's an example:

select * from replication_applier_progress\G
*************************** 1. row ***************************
             CHANNEL_NAME: node1
               GROUP_NAME:
              SOURCE_UUID: 00007801-1111-1111-1111-111111111111
                THREAD_ID: 35
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:41
    TRANSACTION_TIMESTAMP: 2016-09-20 22:24:13
          APPLY_TIMESTAMP: 2016-09-20 22:24:15
 RECEIVED_TRANSACTION_SET: 00007801-1111-1111-1111-111111111111:1-22
  APPLIED_TRANSACTION_SET: 00007801-1111-1111-1111-111111111111:1-22
          Master_Log_File: mysql-bin.000002
      Read_Master_Log_Pos: 6325
           Relay_Log_File: mysql-relay-node1.000002
            Relay_Log_Pos: 6538
    Relay_Master_Log_File: mysql-bin.000002
       applying_on_schema: db1
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 2. row ***************************
             CHANNEL_NAME: node2
               GROUP_NAME:
              SOURCE_UUID: 00007802-2222-2222-2222-222222222222
                THREAD_ID: 39
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:41
    TRANSACTION_TIMESTAMP: 2016-09-20 22:25:15
          APPLY_TIMESTAMP: 2016-09-20 22:25:18
 RECEIVED_TRANSACTION_SET: 00007802-2222-2222-2222-222222222222:1-21
  APPLIED_TRANSACTION_SET: 00007802-2222-2222-2222-222222222222:1-21
          Master_Log_File: mysql-bin.000002
      Read_Master_Log_Pos: 6039
           Relay_Log_File: mysql-relay-node2.000002
            Relay_Log_Pos: 6252
    Relay_Master_Log_File: mysql-bin.000002
       applying_on_schema: db2
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
*************************** 3. row ***************************
             CHANNEL_NAME: node3
               GROUP_NAME:
              SOURCE_UUID: 00007803-3333-3333-3333-333333333333
                THREAD_ID: 43
            SERVICE_STATE: ON
COUNT_RECEIVED_HEARTBEATS: 362
 LAST_HEARTBEAT_TIMESTAMP: 2016-09-20 23:05:42
    TRANSACTION_TIMESTAMP: 2016-09-20 22:28:16
          APPLY_TIMESTAMP: 2016-09-20 22:28:19
 RECEIVED_TRANSACTION_SET: 00007803-3333-3333-3333-333333333333:1-4
  APPLIED_TRANSACTION_SET: 00007803-3333-3333-3333-333333333333:1-4
          Master_Log_File: mysql-bin.000002
      Read_Master_Log_Pos: 1177
           Relay_Log_File: mysql-relay-node3.000002
            Relay_Log_Pos: 1390
    Relay_Master_Log_File: mysql-bin.000002
       applying_on_schema: db3
        LAST_ERROR_NUMBER: 0
       LAST_ERROR_MESSAGE:
     LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00

The same level of detail should be used for replication_applier_status_by_worker, which, by the way, has a more appropriate name than replication_connection_status.

Monitoring tables are in different places

MySQL 5.6 introduced replication tables in the mysql database. Unfortunately, this inclusion happened before GTID was implemented. As a result, we have tables in mysql database reporting log names and positions and tables in performance_schema reporting GTIDs, but without correlations.

How should it be instead

The replication tables in mysql should be removed, and integrated in the performance_schema as shown above.

There is no table for SHOW MASTER STATUS

The information in SHOW SLAVE STATUS has been translated almost entirely in performance_schema tables. No such thing happened to SHOW MASTER STATUS. To make monitoring more efficient, all the data used for monitoring should be in tables, and the information should not contain more than one piece of information per field, as we have seen in previous cases.

How should it be instead

There should be a table containing the information from SHOW MASTER STATUS, except the filter info, which should be in a different table:

select * from replication_master_status\G
SHOW MASTER STATUS\G
*************************** 1. row ***************************
                 File: mysql-bin.000002
             Position: 154
    Executed_Gtid_Set: 00007801-1111-1111-1111-111111111111:1-22
Transaction_timestamp: 2016-09-20 22:24:13

select * from replication_filters\G
*************************** 1. row ***************************
     Filter_Name: binlog_do_db
     Filter_Yype: binlog
    Filter_Value: personnel.*
   Filter_Status: ACTIVE

Call for action

Dear MySQL product managers: please take action to make replication more consistent and usable. I may have forgotten something, but I think the requests in this article are a good start to improve the design of MySQL replication.

3 comments:

Hartmut said...

Has it really been eight years since http://datacharmer.blogspot.de/2008/08/introducing-mysql-community-driven.html ?

Unfortunately P_S still isn't extensible by plugins (or is it going to be in 8.0?)

Unknown said...

Point of view is important when discussing design features and flaws. From a manageability perspective, unique identities can be very important in order to correctly model relationships and target management actions. Stay around that realm long enough and you'll experience reuse of IP addresses in the same environment, reuse of server ids, etc. that make any other way of doing things confusing to all involved. Should less-hostile representations be exposed to end users, yes. Should globally unique identifiers be avoided at all costs to make things more usable to humans? Seems a bit drastic to me.

Simon J Mudd said...

For VARIABLES vs STATUS I must agree having filed this bug 4 years ago before 5.6 went GA:
See: http://bugs.mysql.com/bug.php?id=65189 Don't pollute status settings in SHOW [GLOBAL] VARIABLES