Thursday, July 30, 2015

MySQL replication monitoring 101

Replication is the process that transfers data from an active master to a slave server, which reproduces the data stream to achieve, as best as possible, a faithful copy of the data in the master.

To check replication health, you may start with sampling the service, i.e. committing some Sentinel value in the master and retrieving it from the slave.

Sentinel data: Tap tap… Is this thing on?

If you want to make sure that replication is working, the easiest test is using replication itself to see if data is being copied across from the master to the slaves. The method is easy:

  1. Make sure that the data you want to see is NOT in the master or in the slave. If you skip this step, you may think that replication is working, while in fact it may not.
  2. Either create a table in the master or use a table that you know exists both in the master and the slave.
  3. Insert several records in the master table.
  4. Check that they are replicated in the slave correctly.
  5. Update a record in the master.
  6. Watch it changing in the slave.
  7. Delete a record in the master.
  8. Watch it disappear in the slave.

This method is important because it works regardless of the database server and replication type. Whether you are replicating out of a MySQL master to a MySQL slave using native replication, or from an Oracle server to a Vertica slave using Tungsten Replicator, or sending data across MySQL Clusters through replication, or again using multiple sources in MySQL 5.7 or MariaDB 10, this method will always tell you if replication is happening or not.

The above method is just a simple proof that replication can work. It is not, however, proof that replication works always or that it does replicate all your data.

You can’t actually prove that replication always works. But the best approximation is monitoring it so that you know when it stops doing what you want.

As for making sure that replication copies ALL your data, you can’t prove that either, but by using checksum probes, you can achieve some peace of mind.

Monitoring: Are you still there?


Monitoring replication is the process of ensuring that the data transfer tasks between master and slave services are working as expected. Unlike the sampling process above, where you check known data in the master and the slave, monitoring works on replication metadata, which is the data produced by the services involved in replication, telling you important pieces of information regarding the task:

  • Data origin. This is the identity of the master service, which can be marked by a host name, a server ID, a service or channel name, or a combination of the above. Sometimes called data domain.
  • Data stream. This is the name of a file, or a series of files containing the data being replicated. It could be an actual file in the operating system, or a URI in a wider environment. It may also be a port number and a protocol identifier when the replication is performed through an abstract API. If replication happens with global transaction identifiers, the reference to data streams can even be omitted.
  • Service stream. In replication systems that allow replication from multiple masters, and thus several streams converging to a single host, a service stream is the data coming from a given source into a slave. Regular MySQL replication does not have such concept. Tungsten Replicator, MariaDB 10, and MySQL 5.7 can define this notion to different degrees of usefulness.
  • Data extraction positions. These positions mark the point where the master services have set data available for the slaves. In practice, from a classic MySQL master/slave replication standpoint, it is the latest point written by the master in the binary logs. In other forms of replication, it could be the latest point where the master has organized its replication metadata in a format understandable by its slaves.
  • Data apply positions. These are several positions in the data stream where the replication slave has started or terminated a given task. Such tasks could be:

    • read the data from the distant master into a temporary stream;
    • start applying the data;
    • commit the data.
  • Global transaction identifiers. A concept that is novel to MySQL replication (introduced with some use limitations in MySQL 5.6 and with less limits in MariaDB 10) but well known in other replication systems such as Tungsten. It is a way of identifying transactions independently from positions in the replication stream.
  • Stage transfer information. This is metadata specific to the stages of replication, i.e. to the steps taken to transport data from one place to another. There is very little of this metadata publicly available in MySQL replication, but in more advanced replication systems you may get information on what happens when the data is copied to the replication stream, when it is sent across the network, when it is extracted from the stream, and when it is applied. The importance of this stage data is also enhanced if replication can be modified with filters at every stage.
  • Task related information. This metadata tells the state of a given task during replication. While the stage related data deals with the steps of the replication, the task is the practical work performed to advance the progress of the data through those steps. So you may see that the replication is currently busy in the stages of replication-stream to processing-queue, and from the processing queue to the dbms. Looking at the corresponding tasks, you can get the amount of resources (memory allocation, time) by each task in the process. You may know, for example, that a given task has used 5 seconds to extract a transaction from the replication stream, and the next task has spent 1 second to apply the same data to the DBMS. If the slave is lagging at that moment, you will know that the problem is not in the speed of the database but in the network. MySQL native replication does not have this rich information on tasks as of version 5.7.
  • Shard related information. This is information that shows what action is happening in a given share of the replication process, which may or may not be split by shards. The concept of shard is volatile, and can be expanded to several things. It could be a time-based identifier that splits the data into regular chunks, or it could be a hash algorithm that maintains the data load balanced across servers, or i could be a physical boundary, such as a schema name or a table name prefix, which defines the extent of the replication work. There is little or no sharding concept in MySQL prior to MySQL 5.6.
  • Channel or thread information. When replication can run in parallel streams, the replication system should be able to detect the status of every thread in detail. Depending on the replication implementation, this information can be equivalent to the shards or to the replication service streams. Recently, MySQL 5.7 uses channel to refer to a data stream.


In MySQL master/slave topologies, monitoring means comparing metadata from the master with metadata in the slaves, to ensure that replication is running to our satisfaction.

Up to version 5.5 (but it still holds true in later versions,) monitoring replication means essentially five things:

1. Making sure that the slave is replicating from the intended master.

2. Checking that the slave is replicating from the right binary logs.

3. Checking that the data from the master is transferred to the slave.

4. Checking that the slave is applying data without errors.

4. Checking that the slave is keeping up with the master.

To achieve the above goals, we need three pieces of information.

1. We need to know who the master is

2. What the master is doing,

3. And finally what the slave is doing.

Knowing only the slave status is not enough, as the slave may be replicating from the wrong source, or from the wrong set of binary logs.

To get the master identity, we determine in which host and port it is running:

master [localhost] {msandbox} ((none)) > show variables like 'port';  
| Variable_name | Value |  
| port          | 22786 |  
1 row in set (0.01 sec)

master [localhost] {msandbox} ((none)) > show variables like 'hostname';  
| Variable_name | Value     |  
| hostname      | localhost |  
1 row in set (0.00 sec)

To know what the master is doing, we run SHOW MASTER STATUS:

master [localhost] {msandbox} ((none)) > show master status\G  
*************************** 1. row ***************************  
            File: mysql-bin.000003  
        Position: 5149170  
1 row in set (0.00 sec)

Now we know that the master is running on localhost, with port 22786, and that it was last seen writing to binary log mysql-bin.000003 at position 5149170.

Armed with this information, we proceed to check the result of “SHOW SLAVE STATUS”

slave1 [localhost] {msandbox} ((none)) > SHOW SLAVE STATUS\G  
*************************** 1. row ***************************  
               Slave_IO_State: Waiting for master to send event  
                  Master_User: rsandbox  
                  Master_Port: 22786  
                Connect_Retry: 60  
              Master_Log_File: mysql-bin.000003  
          Read_Master_Log_Pos: 5149170  
               Relay_Log_File: mysql_sandbox22787-relay-bin.000006  
                Relay_Log_Pos: 2060153  
        Relay_Master_Log_File: mysql-bin.000003  
             Slave_IO_Running: Yes  
            Slave_SQL_Running: Yes  
                   Last_Errno: 0  
                 Skip_Counter: 0  
          Exec_Master_Log_Pos: 2060007  
              Relay_Log_Space: 5149528  
              Until_Condition: None  
                Until_Log_Pos: 0  
           Master_SSL_Allowed: No  
        Seconds_Behind_Master: 2  
Master_SSL_Verify_Server_Cert: No  
                Last_IO_Errno: 0  
               Last_SQL_Errno: 0  
             Master_Server_Id: 1

We can easily establish the master identity by checking Master_Host and Master_Port. In our case, we see that the master is, which, if you know your Latin in the Unix world, means localhost. The port is the same that the master uses. Check objective #1. The master is what it should be.

Then we check that we are replicating from the intended binary log. The master says mysql-bin.000003 and so does the slave at Master_Log_File. Here the binary log name could be different. For example, the master could say mysql-bin.000013 and the slave could still be processing mysql-bin.000006. There are several reasons for this to happen (planned maintenance, long lasting DDL operations, slow network, high traffic on the slave), but if the binary log names are inconsistent (for example, the slave is using mysql-bin.000013 while the master is using mysql-bin.000006) or if the name pattern is different (master says–000012, while the slave says mysql-bin.000012) then we have a setup mismatch, and the slave is not replicating from the intended data stream. In our case, objective #2 is attained, since the names match.

To see if the data from the master has reached the slave, we first check that Slave_IO_Running is set to “Yes”, and then compare the value from Position in SHOW MASTER STATUS with the slave’s Read_Master_Log_Pos: in our case, the slave reports the same position as the master, thus certifying objective #3. This value can be lower, or the slave could still be processing a previous binary log. But replication is still considered healthy (as far as objective #3 is concerned) if these values increase continuously, i.e. if the slave keeps receiving data while the master produces it.

It’s worth mentioning that we get here some information about stage processing. MySQL replication has two stages: (1) pull from binary logs into relay logs, and (2) apply from relay logs into the database. We can get information about which binary log is currently being pulled, and to which relay log we’re writing. The exact positions of pulling and applying with reference to the master positions are not always available without inspecting the logs. Since the relay logs can rotate or be flushed at a different pace than the binary logs, monitoring the stage information with the simple metadata provided by SHOW SLAVE STATUS is not always easy or even possible.

The fourth objective is simple to check: if Slave_SQL_Running says “No” and Last_Error is non-empty, then we have an error to deal with. That’s beyond the scope of this article, and I recommend Ronald Bradford’s excellent book on MySQL replication for more information.

The fifth objective is determined by comparing again the value of Position in the master with Exec_Master_Log_Pos. In the current status, we’re behind by about 3 MB. (5149170–2060007). That could be a concern if it keeps that way. In our example, it was due to a massive data load, which was quickly absorbed. An additional check one minute later shows that all is well:

        File: mysql-bin.000003  
    Position: 168394841

          Master_Log_File: mysql-bin.000003  
      Read_Master_Log_Pos: 168394841  
         Slave_IO_Running: Yes  
        Slave_SQL_Running: Yes  
      Exec_Master_Log_Pos: 168394841

Let me stress the importance of comparing data from master and slave. If you only have information in the slave that says it is executing position 1,000 from master binlog 134, you have no way of knowing if the slave is doing well until you see the master’s info. If the master shows that it is writing to binlog 134, you know that the slave is keeping up, but if the master mentions binlog 300, then you have a huge problem.

Latency: Are you catching up?

The data that gets in the master is the only data that matters. What’s in the slaves is just a copy that is there for your convenience, so that your applications can keep asking questions without burdening the master.

In a perfect world, data inserted in the master should be available in the slaves instantly. But reality is often different. Data can be delayed because of concurrent operations, network slowness, long lasting transactions in the slave, high query traffic, and so on.

Thus, you must assume that there is some latency between master and slave, and you need to know how bad that latency is. It could be as low as a few microseconds, or as long as minutes. Or even hours, if a slave was out for maintenance.

The way latency is measured depends on the metadata available in the replication stream. If there is no sensible metadata available, you can use a hybrid system, similar to the sentinel data method outlined above, where you insert a high resolution timestamp in the master, and compare it to a similar value in the operating system when retrieving the data from the slave. See Measuring Replication speed for a sample implementation.

In general, when metadata is available, the latency is the amount of time elapsed between the moment when the transaction was committed and when the same data is applied in the slave. By this definition, latency can vary between slaves of the same system, because of networking or performance issues.

Status persistence: That was an awful fall. Are you OK, dear?

As long as the database server is alive, we should be able to get sensible information about its functioning and the state of replication. But computers can fail, and you should expect them to fail, especially if you are running them in large numbers. With server failure seen as an everyday occurrence, it is reasonable to expect your database to come online and heal itself after a server crash. With storage engines like innodb, this expectation is legitimate, as the system is able to survive nasty crashes. What about replication data?

In MySQL, up to version 5.5, replication metadata was stored in files ( and Since these files depend on the operating system to maintain their data, it often happens that they are out of sync with the database server, meaning that the metadata doesn’t show the latest replication position. When this happens, we have a replication inconsistency, which can result in a breakage (duplicate key insert) or data duplicate (statement-based updates applied twice).

Recent releases (MySQL 5.6, MySQL 5.7 and MariaDB 10.x) have adopted the crash-safe tables (which Tungsten Replicator has had for at least four years already) with different degrees of readiness.

Having a persistent state allows the monitoring tools to rely on the stored data and detect the health of replication after a failure.

Completeness: Did you miss anything?

In many replication system, you can add filters to reduce or modify the amount of data being replicated in one or more nodes. Filters can be grouped in four categories:

  • Extractor removal: Data is removed at the source, and it never gets in the replication stream. This happens, for example, when a master has a filter or a client-side setting that prevents an object from being replicated. This omission can be a single occurrence, where the master suspends logging for one or more events, or a permanent occurrence, where a filter removes all events affecting a given schema or table.
  • Applier removal: Data is sent to the slaves, but one or more slaves can filter off events. Again, this can happen temporarily (a slave replicates up to a given event and then skips the unwanted ones) or permanently (a filter tells the slave that some objects should not be replicated).
  • Extractor modifiers: Data is modified on the master side, before it is sent to the slaves. This kind of modification could be benign, as in the case where data is adjusted so that it could be understood by slaves running different database servers (e.g. MySQL to Oracle), or it could be destructive, meaning that its original value or shape cannot be reproduced, as in the case of an anonymizer, where sensitive data are encoded using a one-way function.
  • Applier modifiers: Same as the previous one, but the operation is performed on the slave side. Common occurrences of this filters are schema or table renames (to avoid conflicts for data coming from different sources) or format changes (e.g. date format transformation when sending data between heterogeneous servers).

We must be very cautious when using these filters. If a slave does not have all the data from the master, it won’t be a suitable replacement when the master fails. If you run replication for high availability, at least one slave should be kept online without filters.

Checksum probes: Have you got it all?

Monitoring Is not enough for the DBA’s peace of mind. We also need to make sure that all the data from the master has reached the slaves, and that the data in the slaves is a reliable copy of what we have in the master.

Sadly, there is no acknowledged standard method to achieve this goal. If your data is reasonably small, you can brute-force compare the databases with a quick data extraction in both servers followed by a few OUTER JOIN queries. But this is seldom the case. You usually want to compare data too big to fit in memory, and too big to be transferred across servers efficiently, and you also want to run the comparison without stopping the servers. The problem was outlined in an article that I wrote more than 10 years ago Taming the Distributed Database problem and which has inspired some of the tools currently available to the DBAs (pt-table-checksum is probably the most popular among them). But the tools are not easy to use, or friendly, or if they are friendly they lack one or two requirements. Since this topic is still in flux, I will resume it when the available tools reach a higher level of usefulness.

What’s next

This article covers the necessary background that will allow me to explore in more detail how replication works in the latest development in MySQL and surrounding environment, namely MySQL 5.6, MySQL 5.7, and MariaDB 10, with some sparse comparisons with Tungsten Replicator. Armed with the above definitions and examples, we can compare replication monitoring and managing capabilities for different systems.

We will specifically explore how to use GTID (global transaction IDs) in regular master/slave replication, with multiple sources and parallel replication.

No comments:

Vote on Planet MySQL