Monday, September 29, 2008

Will you use row-based replication by default?


MySQL 5.1

MySQL 5.1 introduces row based replication, a way of replicating data that fixes many inconsistencies of the statement based replication, the standard method used by MySQL so far.

The good: row based replication solves some problems when replicating the result of non deterministic functions, such as UUID() or NOW().
The bad: row-based replication may break existing applications, where you count on the quirks of statement based replication to execute conditionally (updates base on @@server_id, for example), and may perform badly on updates applied to very large tables.

Currently, MySQL 5.1.28 has the mixed mode enabled by default. When in this mode, binary logging switches to row-based when there is a risk of breaking replication. Except that there are some cases left out, like Bug#39701 (load_file()).
Mixed mode is supposed to be the best of two worlds. You can run fast updates on large tables and not risking inconsistent results on non-deterministic functions. The recent addition to mysqlbinlog allows an easy decoding of the ugly looking row-based statements.
Would you switch to mixed mode or stay in statement mode? There are requests of reverting the default to statement based binlog format, to maintain compatibility with old applications.
Personally, I would go for the mixed mode. But it depends on your applications and your personal experience. What's your take?

2 comments:

ryan said...

My preference is to leave the default as mixed as you suggest in theory at least it's the best of both worlds. However in most cases I will be using RBR because it can substantially improve replication lag.

People interested in increasing slave performance will find out about row based replication, and this preserves the status quo for applications which rely on the non-determinism of statement based replication.

Sheeri K. Cabral said...

Replication lag isn't a consistent, ever-present issue in the databases I work with. Sure, a large transaction or ALTER TABLE can lock tables for a long time, stalling replication, but those cases are relatively few and far between....

...compared to the problem of statement-based replication drifting out of sync. I'd really have to see if and how row-based replication drifts out of sync due to network issues or whatever, in which case I'd think the table would end up somewhat corrupted...if that were the case I might stick with statement-based.

It depends what the pain points are, and while I'm intimately familiar with the pain points of statement-based replication, I haven't used row-based replication long enough to know what the pain points are (other than the obvious ones such as "changing a lot of data will use a lot of network bandwidth").....