Tuesday, May 17, 2011

The price of safe data - Benchmarking semi synchronous replication

Some time ago I wrote about MySQL 5.5 semi-synchronous replication. Since then, I have wanted to benchmark the overhead of semi-synchronous replication with a decent server. Now the occasion presented itself, thanks to some related business that I had to benchmark, and thus I did a few simple runs with and without semi-synchronous replication enabled, to see the impact of this feature on performance. If you haven't read the article on semi-synchronous replication, the bottom line is that, with this feature enabled, the master waits until at least one slave has acknowledged receipt for the data before returning a positive result to the client. This means that for each commit there are two network calls between master and slave. My gut feeling was that this feature would be costly in terms of query response time, although I was not prepared to such a big impact as I found out in my test. I needed a substantial set of data, and I got it by exporting the employees table from the employees test database, using one INSERT per record. Thus, I had about 300,000 records, which are a fair amount for this kind of test. Had I sent the records in a big multiple insert chunk of 10,000 records each, I would have had only 30 commits, which would not have been easy to measure. So, here goes.

regular replication
$ time mysql < employees.sql 

real 0m27.997s
user 0m1.394s
sys 0m1.046s

semi-synchronous replication
$ time mysql < employees.sql 
real 1m24.277s
user 0m3.842s
sys 0m6.270s

Semi-synchronous replication was three times slower than regular replication. The test was taken using one master in one host and one slave in two more hosts. The measurements were the same if I had only one or both slaves enabled. Using row-based replication instead of statement-based did not make any substantial impact. Now my question is: who would be prepared to accept such a performance impact for the sake of more data safety? Data is important, but response time to customers is also important. Your mileage may vary. I know many customers who would think twice before accepting this onerous trade off. I am curious to know what experience others have had with this feature, and how much performance they are willing to sacrifice for safety.

12 comments:

mysqlscott said...

Curious what the connection between servers was? Assuming more round trips, perhaps a slower connection would multiply the delay.

Mark Callaghan said...

The overhead from semi-sync in this case is ~200 microseconds per commit. I don't think that is a big deal for most use cases.

Giuseppe Maxia said...

@mysqlscott,
it's a 1GB/s connection.
But I have also a 100mb connection on those machines. Using the slower NIC I get this:

$ time mysql < employees.sql

real 2m41.270s
user 0m3.664s
sys 0m4.985s

Dimitri said...

Guiseppe,

with all my respect, but what do you see wrong here?.. - the case you're comparing is similar to compare an I/O WRITE test, like:

write();
write();
...
write():

vs

write(); fsync();
write(); fsync();
...
write(); fsync();

for sure in case you're waiting for every write sync you'll be slower comparing to writing to the filesystem buffer..

same for semi-sync replication, and sure there is a cost.

Rgds,
-Dimitri

Giuseppe Maxia said...

@Dimitri,
I don't see anything wrong. I was not expecting the impact on performance to be so high. That's all.

I am trying to figure out if such impact is acceptable or not.

P.S. My name is Giuseppe

hingo said...

Giuseppe, we seem to often be engaged with the same thoughts without knowing it!

Did you notice my tests in this area http://openlife.cc/blogs/2011/may/drbd-and-semi-sync-shootout-large-server

In my tests: Having innodb_flush_log_at_trx_commit=1 and sync_binlog=1 using semi sync replication adds negligible overhead. What's better, with semi-sync replication I can safely set sync_binlog=0 (it's already replicated, no need to sync it locally) and get hugely improved performance over a single node system.

I assume you did not sync after each commit, did you?

Giuseppe Maxia said...

@hingo,
I was using sync_binlog=0, yes.

Dimitri said...

Giuseppe,

the cost here is a cost of the round trip of a packets between nodes. BTW, did you monitor a network activity during both tests?.. - I'm curious if it'll be the same rates in packets/sec and KB/sec. (As well it's very possible that NO DELAY option is not used on the sockets and short delays are added on each packet - so some network tuning should be also involved).

Rgds,
-Dimitri

Daniël van Eeden said...

I think some people will use it like this:

They have a async replication with both master and slave configured with sync_binlog=1 and innodb_flush_log_at_trx_commit=1 and other durability settings.

After switching to semi-sync they will relax the durability settings as it's more-or-less guaranteed that the data will also be on one of the slaves.

In that case the performance penaly might be acceptable for some.

I would like to see it benchmarked againt other semi-sync and sync replication setups for other databases like PostgreSQL and Oracle with DataGuard.

Anonymous said...

Great write up - you just saved me a load test!

Mark Callaghan said...

"I don't see anything wrong. I was not expecting the impact on performance to be so high. That's all."

By my math the overhead is 200 microseconds per commit. Your workload is way too simple and thus a 200 microsecond overhead makes this 3X slower. Silly benchmarks lead to silly results.

Petervdb said...

Hi,
I have done a similar test with mysqlslap. The test was done with MyISAM, InnoDB and with and without semi-synchronous replication.
I used MySQL 5.5.16 on a small server with 1GB RAM, dual core, RAID-1 72GB disk (Dell M610)
Both master and slave have the same hardware and setup.

Without semi-synchronous replication:
mysqlslap --concurrency=10 --iterations=100 --number-int-cols=5 --number-char-cols=10 --auto-generate-sql --engine=MyISAM
Benchmark
Running for engine MyISAM
Average number of seconds to run all queries: 3.191 seconds
Minimum number of seconds to run all queries: 2.958 seconds
Maximum number of seconds to run all queries: 3.676 seconds
Number of clients running queries: 10
Average number of queries per client: 0

mysqlslap --concurrency=10 --iterations=100 --number-int-cols=5 --number-char-cols=10 --auto-generate-sql --engine=InnoDB
Benchmark
Running for engine InnoDB
Average number of seconds to run all queries: 3.994 seconds
Minimum number of seconds to run all queries: 3.540 seconds
Maximum number of seconds to run all queries: 4.577 seconds
Number of clients running queries: 10
Average number of queries per client: 0

With semi-synchronous replication:
mysqlslap --concurrency=10 --iterations=100 --number-int-cols=5 --number-char-cols=10 --auto-generate-sql --engine=MyISAM
Benchmark
Running for engine MyISAM
Average number of seconds to run all queries: 3.234 seconds
Minimum number of seconds to run all queries: 2.998 seconds
Maximum number of seconds to run all queries: 3.885 seconds
Number of clients running queries: 10
Average number of queries per client: 0

mysqlslap --concurrency=10 --iterations=100 --number-int-cols=5 --number-char-cols=10 --auto-generate-sql --engine=InnoDB
Benchmark
Running for engine InnoDB
Average number of seconds to run all queries: 4.051 seconds
Minimum number of seconds to run all queries: 3.545 seconds
Maximum number of seconds to run all queries: 5.690 seconds
Number of clients running queries: 10
Average number of queries per client: 0