Thursday, November 04, 2010

Testing MySQL 5.5 semi-synchronous replication

A few days ago I saw an article about Semi-Synchronous Replication in MySQL 5.5. It asks questions, and doesn't give answers beyond gut feeling. So I thought that I would do some practical testing of this new feature.
Before we go that way, though, let's revisit the theory.

How semi-synchronous replication works


Figure 1. A transaction with regular replication
With regular replication, you send a transaction to the master (1). When the COMMIT is received, the master executes it (2), and if successful it logs the transaction to the binary log (3). The the master answers the client request (4) with a successful result. In the meantime, the slaves replicate the record (5).
What happens if the master crashes after point #4 and before a slave has had a chance of getting the data in point #5?
The client will have a result for that transaction, but that data is lost, because it has never reached one slave.


Figure 2. A transaction with semi-synchronous replication

Let's see the same scenario with semi-synchronous replication. All is the same until point #3. Then, things change. The master does not return to the client. Instead, it alerts the slave that a transaction is available (4). The slave gets it and stores it to the relay log (5). Without further action, the slave tells the master that the transaction was received (6) and only then the master returns the result to the client (7).
What is the benefit? If the master crashes, the transaction is lost. but the client does not get the false information that the transaction was stored. Instead, if the master crashes before it returns the result to the client, the client gets an error, and knows that that transaction needs to be reworked when a new master is back.

Semi-synchronous replication in practice

Now, the practicalities.
How do you tell if semi-synchronous replication is working? If you leave the default timeout of 10 seconds, you have an immediate clue that something is wrong when a query takes too long to return. Investigation is possible by looking at the GLOBAL STATUS variables.
Rpl_semi_sync_master_status tells you if the master is ready for semi-synchronous replication.
Rpl_semi_sync_master_yes_tx is the number of positive transactions that were delivered using semi-synchronous replication.
Rpl_semi_sync_master_no_tx is the number of failed attempts at delivering a transaction via semi-synchronous replication. When that happens, Rpl_semi_sync_master_status becomes "OFF", and you need investigating.

The important thing to understand about this feature is that semi-synchronous replication does not guarantee that your transaction is executed in the slave. It will only tell you that the data has been transferred to the slave relay log. It can still happen that the transaction fails to execute on the slave (which could be either a bug in your application or a bug in MySQL replication). But this is not a cluster. Don't expect a two-phase commit.

Testing semi-synchronous replication

If you want to test this feature without suffering too much, you can use a tarball binary and MySQL Sandbox. Once you have installed MySQL Sandbox and have downloaded the server tarball, you can install a test replication system with
make_replication_sandbox --how_many_slaves=4 /path/to/mysql-5.5.6-yourOS.tar.gz
This will create a system with 1 master and 4 slaves.
The Sandbox has a shortcut to install the plugin quickly and painlessly:
sbtool -o plugin --plugin=semisynch -s $HOME/sandboxes/rsandbox_5_5_6
Now you will have the semi-synchronous plugin installed in the master and all the slaves. For our tests, we will make a shell script, an easy task thanks to the sandbox utilities.

#!/bin/sh

echo "disabling semi-synch replication in all slaves except 1"
./s1 -e 'set global rpl_semi_sync_slave_enabled=1'
./s1 -e 'slave stop io_thread; slave start'

for E in 2 3 4
do
    ./s$E -e 'set global rpl_semi_sync_slave_enabled=0'
    ./s$E -e 'slave stop io_thread; slave start'
done

#
# this query will show the main variables that tell
# if semi-synch replication is working
#
Q1='select variable_name, variable_value'
Q2='from information_schema.global_status'
Q3='where variable_name in'
Q4='("RPL_SEMI_SYNC_MASTER_YES_TX", "RPL_SEMI_SYNC_MASTER_NO_TX")'
I_S_Q="$Q1 $Q2 $Q3 $Q4"

echo ""
echo "creating a table. it should replicate through the semi-synch"
./m -vvv -e 'create table test.t1( i int)'
./m -e "$I_S_Q"

echo ""
echo "inserting a record. The number of 'YES' should increase"
./m -e 'insert into test.t1 values (1)'
./m -e "$I_S_Q"

echo ""
echo "disabling semi-synch replication in slave 1"
./s1 -e 'set global rpl_semi_sync_slave_enabled=0'
./s1 -e 'slave stop io_thread; slave start'

echo ""
echo "enabling semi-synch replication in slave 3"
./s3 -e 'set global rpl_semi_sync_slave_enabled=1'
./s3 -e 'slave stop io_thread; slave start'

echo ""
echo "inserting a record. The number of 'YES' should increase"
./m -e 'insert into test.t1 values (2)'
./m -e "$I_S_Q"

echo ""
echo "disabling semi-synch replication in slave 3"
./s3 -e 'set global rpl_semi_sync_slave_enabled=0'
./s3 -e 'slave stop io_thread; slave start'

echo ""
echo "inserting a record. The number of 'NO' should increase"
./m -vvv -e 'insert into test.t1 values (3)'
./m -e "$I_S_Q"

echo ""
echo "enabling semi-synch replication in slave 2"
./s2 -e 'set global rpl_semi_sync_slave_enabled=1'
./s2 -e 'slave stop io_thread; slave start'

echo ""
echo "inserting a record. The number of 'YES' should increase"
./m -e 'insert into test.t1 values (4)'
./m -e "$I_S_Q"
This script will first disable semi-synchronous replication in all the slaves except one. Then it will create a table, and check for the telling status variables.
This should work quickly and without problems. Then it will disable the plugin on the only slave that was active, and enable another slave instead.
Inserting a record on the master will work again quickly, as the newly enabled slave will get the record immediately.
Then the slave gets disabled, and we can witness what happens. The query takes a bit longer than 10 seconds, and the status variable tells us that semi-synchronous replication has failed.
We finally enable yet another slave, and when we try a further insertion, we can see that the semi-synchronous replication has resumed.

Very important:
To enable or disable semi-synchronous replication on a slave it is not enough to set the appropriate variable. You need also to restart the slave by issuing a STOP SLAVE IO_THREAD followed by a START SLAVE commands.

Here is a sample run:
disabling semi-synch replication in all slaves except 1

creating a table. it should replicate through the semi-synch
--------------
create table test.t1( i int)
--------------

Query OK, 0 rows affected (0.87 sec)

Bye
+-----------------------------+----------------+
| variable_name               | variable_value |
+-----------------------------+----------------+
| RPL_SEMI_SYNC_MASTER_NO_TX  | 0              |
| RPL_SEMI_SYNC_MASTER_YES_TX | 1              |
+-----------------------------+----------------+

inserting a record. The number of 'YES' should increase
+-----------------------------+----------------+
| variable_name               | variable_value |
+-----------------------------+----------------+
| RPL_SEMI_SYNC_MASTER_NO_TX  | 0              |
| RPL_SEMI_SYNC_MASTER_YES_TX | 2              |
+-----------------------------+----------------+

disabling semi-synch replication in slave 1

enabling semi-synch replication in slave 3

inserting a record. The number of 'YES' should increase
+-----------------------------+----------------+
| variable_name               | variable_value |
+-----------------------------+----------------+
| RPL_SEMI_SYNC_MASTER_NO_TX  | 0              |
| RPL_SEMI_SYNC_MASTER_YES_TX | 3              |
+-----------------------------+----------------+

disabling semi-synch replication in slave 3

inserting a record. The number of 'NO' should increase
--------------
insert into test.t1 values (3)
--------------

Query OK, 1 row affected (10.12 sec)

Bye
+-----------------------------+----------------+
| variable_name               | variable_value |
+-----------------------------+----------------+
| RPL_SEMI_SYNC_MASTER_NO_TX  | 1              |
| RPL_SEMI_SYNC_MASTER_YES_TX | 3              |
+-----------------------------+----------------+

enabling semi-synch replication in slave 2

inserting a record. The number of 'YES' should increase
+-----------------------------+----------------+
| variable_name               | variable_value |
+-----------------------------+----------------+
| RPL_SEMI_SYNC_MASTER_NO_TX  | 1              |
| RPL_SEMI_SYNC_MASTER_YES_TX | 4              |
+-----------------------------+----------------+
Using the above steps, you should be able to use semi-synchronous replication and do some basic monitoring to make sure that it works as expected.

4 comments:

Mark Callaghan said...

Thanks for taking the time to explain this. The pictures are excellent. I have had at least one user who wanted to know when their commits reached a slave and they weren't thrilled about having to query slave status after each commit. Semi-sync will make life easier for them.

Another benefit from semi-sync is that it rate limits busy users. When one connection tries to do a burst of transactions this can be used to make the client run only as fast as the throughput between the master and IO thread on the slave. That is useful for some deployments.

Ronald Bradford said...

A great intro for people to understand the moving parts. It would be good to see a benchmark comparison between enabled and disabled to determine the true cost in improving durability v throughput.

internet marketing blog said...

Very interesting post, lots of good information. it's very helpful for me. Thanks!

Robert Hodges said...

This is a great description of a very cool feature. It sounds as if we'll get a chance to think together about how to help Tungsten Replicator implement similar callbacks from slaves so it can play nice with MySQL 5.5.