Showing posts with label binary log. Show all posts
Showing posts with label binary log. Show all posts

Thursday, August 06, 2015

Changed defaults between MySQL 5.6 and 5.7

MySQL 5.7 comes with many changes. Some of them are better explained than others.

I wanted to see how many changes I could get by comparing SHOW VARIABLES in MySQL 5.6 and 5.7.
The most notable ones are:

  • binlog_format: the default is now ROW. This variable affects the format of the binary log, whether you use it as a backup complement or for replication, the change means bigger binary logs and possibly side effects.
  • binlog_error_action now defaults to ABORT_SERVER. If the server cannot write to the binary log, rather than continuing its work without logging, it shuts down. This could be a desirable course of action, but better be prepared for the eventuality.
  • innodb_strict_mode is enabled by default, which is probably a good thing, but it means that previously accepted events will now generate an error instead than a warning.
  • sql_mode is now STRICT by default. While many well prepared users will be pleased with this change, which was advocated as best practice by some DBAs, the practical outcome is that several exiting applications may break because of unclean input.
  • sync_binlog, which affects data safety but also server performance is enabled.

Wednesday, March 27, 2013

Multi-master data conflicts - Part 2: dealing with conflicts

In the first part of this article we examined the types of conflicts and their causes. In this part, we will analyse some of the methods available to deal with conflicts.

Pessimistic locking (or: conflicts won't happen)

Applicability: synchronous clusters with 2pc

We've covered this topic in the previous article, but it's worth repeating. If you use a synchronous cluster, you don't have conflicts. For example, MySQL Cluster ensures consistent data with updates coming from different nodes. However, MySQL Cluster is not a replacement for a MySQL server, and it has severe limitations.


Optimistic locking

Applicability: synchronous clusters without 2pc (Galera)

Conflicting transactions proceed on different nodes with local locking. The last one then rolls back when it discovers a prior transaction got in first on the same data. For a more detailed analysis of this handling method, see this article by Jay Janssen


Conflict resolution after-the fact

Applicability: EnterpriseDB (none so far for MySQL)

Asynchronous replication is hard for conflicts. A conflict in this state means that the data has been applied to the wrong node or to the wrong object, and something must be done to solve the issue.

Typical remedies offered for conflict resolution are:

  • Earliest or Latest Timestamp: This method says that the oldest or the latest record prevails when a conflict happens. This is hardly a reliable resolution. It's the easiest method to implement, and thus it is offered. But it often results in a hidden data inconsistency problem, where we may find data that we don't expect. The current data was applied simply because it was updated later than the correct record. Also, timestamp calculation requires time synchronization across servers, and possibly across timezones, which calls for extra effort to keep the system functioning.
  • Node Priority: There is a hierarchy of nodes, with different ranks. When a conflict occurs, the node with the highest rank prevails. This method requires the data origin to be stored alongside the contents, and to be easily searchable when conflicts occur. It must also take into account offline nodes, and therefore it should keep the conflict resolution metadata until the offline nodes are back in synch.

Methods that could be implemented in a more advanced technology may include:

  • Origin enforcement: data coming from authorized nodes will be preserved. Data from wrong origin will be dropped, and a consolidation event will be generated and sent to the other nodes. This method would be possible in systems (like Tungsten) that keep track of the event origin.
  • Data merge: If possible and desirable, data from two different sources can be preserved, and merged in the destination table. This rule should also originate a new event to fix the data in the other nodes.

Schema renaming

Applicability: fan-in topologies

Fan in with likely conflicts

Image #1 - Fan-in topology with likely conflicts.

A fan-in topology is easy to implement with Tungsten Replicator, but not easy to maintain. By its nature, fan-in is a conflict waiting to happen. Assuming that all the masters have the same structure, they will replicate multiple changes into the same schema, and it is quite likely that some changes will clash. For this reason, the simple solution often adopted is renaming the schema before the data reaches the slave.

Fan in with schema renaming

Image #2 - Fan-in topology with schema renaming.

I know of at least one user who has successfully applied this technique for a cluster made of 70 masters and one slave.

Conflict prevention: Discipline

Applicability: all topologies

A simple way of preventing conflicts, and one that would make life easier for all is discipline. The organization decides which entry points can update which data, and conflicts are not possible, because the data is inserted or modified only in the places where it is supposed to be.

Multi master r w split

Image #3 - Preventing conflicts with discipline in a star topology.

Conflict prevention: Enforced discipline

Applicability: all topologies

If you have worked in any large organization, either public or private, you know that discipline alone is the worst method you can rely on for something so delicate and valuable as your data. The reasons why this paradigm could fail are many: it could be because some people dislike discipline, or because someone makes a mistake, or because there are too many rules and they don't remember, or because of an application bug that lets you update what you shouldn't.

Either way, you end up with a system that has conflicts and nobody knows what happened and how to fix them. However, there is a way of enforcing this system based on discipline.

This is the "poor-man's" conflict avoidance system. It is based on simple technology, available in most database servers. If you can install a multi-master topology, using either native MySQL (circular) replication or Tungsten Replicator topologies, you can also apply this method.

The key to the system is to grant different privileges for every master. Looking at image #3, you can enforce discipline by granting different privileges to the application user in every master.

In master #1, where we can update personnel, app_user will have SELECT privileges on all databases, and all privileges on personnel.

In master #2, where we can update sales, app_user will have all privileges on sales and read only access to the other databases, and so on.

The key to make this system work well is that you should assign the privileges and not let the GRANT statement being replicated. It should work like this:

# master 1
GRANT SELECT on *.* to app_user identified by 'my password';
# This is good for all masters. Let it replicate

# master 1
SET SQL_LOG_BIN=OFF;
GRANT ALL on personnel.* to app_user;   # This won't replicate


# master 2
SET SQL_LOG_BIN=OFF;
GRANT ALL on sales.* to app_user;

# master 3
SET SQL_LOG_BIN=OFF;
GRANT ALL on vehicles.* to app_user;

# master 4
SET SQL_LOG_BIN=OFF;
GRANT ALL on buildings.* to app_user;

This method works quite well. Since updates for a given schema can be applied only in one master, there is little chance of any mischief happening. Conflicts are not completely removed, though. There are super users and maintenance users who can, consciously or not, introduce errors. For these cases, you may want to look at the next section.

Enforced discipline with certified origin

Applicability: all Tungsten topologies

Discipline based on granted privileges is often robust enough for your needs. However, if you want to keep track of where the data comes from, you should look at a System Of Records technology, where the origin of each piece of data can be traced to its origin.

Tungsten Replicator implements this technology with several topologies. The theory of this matter is beautifully explained by Robert Hodges in an article written some time ago. Here I would like to look at the practical stuff.

To implement a System of Records in Tungsten, you decide where you want to update each schema (which is defined as a shard in our lingo,) assign that schema to a service, and the replicator will enforce your rules.

Once you have defined the shards, you can set the rules. When an event comes to a slave from an UNKNOWN shard, i.e. a shard that was not among the defined rules, you can:

  • Accept the event; (not recommended, really)
  • Drop the event silently
  • Drop the event with a warning in the logs;
  • Generate an error that will break replication (recommended)

You can choose among the above actions when setting a rule for events that come from UNWANTED shards, i.e. a shard that is not the one designated to update that schema.

Here's an example of a shard definition based on an all-masters schema with three nodes:

Conflict prevention 0

Image #4 - Sample conflict prevention in an all-masters topology

# Options to add during installation
--svc-extractor-filters=shardfilter

# policy for unknown shards
--property=replicator.filter.shardfilter.unknownShardPolicy=error

# policy for unwanted shards
--property=replicator.filter.shardfilter.unwantedShardPolicy=error

# Whether the policy for unwanted shards is activated or not
--property=replicator.filter.shardfilter.enforceHomes=false

# whether we allow whitelists to be created
--property=replicator.filter.shardfilter.allowWhitelisted=false


# Loading the rules set

$ trepctl -host host1 -service charlie shard -insert < shards.map

$ cat shards.map
shard_id          master      critical
employees         alpha       false
buildings         bravo       false
vehicles          charlie     false
test              whitelisted false

The rules are set by service, rather than host name. The schema 'employees' can be updated by the service named 'alpha', which has its master in host #1. Similarly, 'buildings' can be updated by 'bravo', with a master in host #2, and 'vehicles' is updated by 'charlie' master service in host #3. Remember that in Tungsten each replication stream from one master to many slaves is a separate service. This way we can keep track of the events origin. Even if the event is routed through a hub in a star topology, it retains its origin in the metadata.

The last line of the rules says that the schema 'test' is whitelisted, i.e. it can be freely updated by any master. And this means that conflicts can happen there, so be careful if you use this feature!

Conflict prevention right event1

Image #5 - Example of a legitimate event coming through

When an expected event comes through, all is well. Each node checks that the event was originated by the authorised master, and the event is applied to the slave service.

Conflict prevention wrong event0

Image #6 - Example of an event originated from an unauthorised node

When the event comes from a node that was not authorised, Tungsten looks at the rules for such case. In our setup, the rule says 'error', and therefore replication will break at the receiving end of the service 'bravo' in host #1 and host #3.

mysql #2> create table employees.nicknames( ... )


# Only server #2 creates the table
# slave service 'bravo' in host1 and host3 get an error
# No table is created in hosts #1 and #3

To detect the error, we can ask for the list of services in host #1 and host #3. What we will see is something like this.

#3 $ trepctl services | simple_services 
alpha    [slave]
seqno:          7  - latency:   0.136 - ONLINE

bravo    [slave]
seqno:         -1  - latency:  -1.000 - OFFLINE:ERROR

charlie  [master]
seqno:         66  - latency:   0.440 - ONLINE

This Listing says that replication was stopped with an error in slave service 'bravo'. To determine what happened exactly, we ask for the status of that service:

#3 $  trepctl -service bravo status
NAME                     VALUE
----                     -----
appliedLastEventId     : NONE
appliedLastSeqno       : -1
appliedLatency         : -1.0
(...)
offlineRequests        : NONE
pendingError           : Stage task failed: q-to-dbms
pendingErrorCode       : NONE
pendingErrorEventId    : mysql-bin.000002:0000000000001241;0
pendingErrorSeqno      : 7
pendingExceptionMessage: Rejected event from wrong shard: 
seqno=7 shard ID=employees shard master=alpha service=bravo
(...)

This status gives us quite a lot of useful information:

  • The event with Global transaction ID (GTID) # 7 was rejected;
  • The reason for rejection was because it came from the wrong shard;
  • The expected shard master (i.e. the authorized service) was alpha;
  • The event was instead originated from service bravo.

With the above information, we can take action to fix the event. We know that GTID 7 is wrong, so we can skip it in both servers where the error occurred. To clean up the error, we can simply generate the correct event in the authorized master

#host #1 
$ trepctl -service bravo online -skip-seqno 7

mysql #1> drop table if exists employees.nicknames;
mysql #1> create table if exists employees.nicknames ( ... ) ;

#3 $ trepctl -service bravo online -skip-seqno 7

Statement-based vs row-based replication

As a general note about conflict solving, I need to mention that, in most cases, using row-based replication vs. statement based will help identifying conflicts, making them easier to clean up.

Even when the conflict involves a deleted row, row-based events will contain enough information that will allow us to identifying the critical data needed to recover information.

Be aware that, if you use binlog-row-image=minimal in MySQL 5.6, the binary log entry for a DELETE event will only include the primary key.


More about filters

We have seen at least in two examples (server renaming and conflict prevention) that you can help avoid conflicts with filters. This is a powerful feature that should be taken into account when planning a multi-master topology.

MySQL native replication offers very little in matter of data transformation through filtering. Tungsten Replicator, instead, allows you to define filters at several stages of the replication process: when extracting the data, after transporting it to the slaves, before applying it. You can write your own filters in JavaScript, and do with the data pretty much everything you want. If you have creative ideas about solving conflicts by manipulating data in transit, there is a good chance that you can implement them using filters. This topic deserves more than a paragraph, and probably I will come back to it soon with a full fledged article.


Parting thoughts

Multi master topologies are much coveted features. However, they often introduce the risk of conflicts.

Dealing with conflicts becomes somewhat easier if you understand how they happen and what kind of problems they generate.

There is no silver bullet solution for conflicts, but recent technology and good organization can help you ease the pain.

Monday, December 20, 2010

Looking for a hack - Passing comment-like info through the binary log

hacker I am facing an interesting problem. I need to mark somehow a statement in such a way that the comment is preserved through the binary log.
I don't have control on how the statement is generated or using which client software. For the sake of example, let's say that I need to mark a CREATE PROCEDURE statement in such a way that, if I extract the query from the binary log and apply it to another server, the information is still available.

Background

Normally, I would use a comment. The first thing I would think is
CREATE PROCEDURE p1(i int) select "hello" /* This is my text */
But most client libraries will strip it.
There was a clever trick by Roland Bouman that allowed users to bypass this limitation. You could use a qualified comment such as /*!999999 This is my test */, but unfortunately it only works in MySQL 5.0.x, while MySQL 5.1 strips everything down, even if the comment is a legitimate keyword.

create procedure p9 (j int) insert /*!50000 INTO */ t1 values (j) ;
Query OK, 0 rows affected (0.00 sec)

 show create procedure p9\G
*************************** 1. row ***************************
           Procedure: p9
            sql_mode: 
    Create Procedure: CREATE DEFINER=`msandbox`@`%` PROCEDURE `p9`(j int)
insert  INTO  t1 values (j)
character_set_client: latin1
collation_connection: latin1_swedish_ci
  Database Collation: latin1_swedish_ci
1 row in set (0.00 sec)

Other tricks

Something else that I have tried: I can inject a query before or after the one that I need to monitor.
create table if not exists comments (t varchar(100)) engine=blackhole;
 
update comments set t='the next statement is what I need';
create procedure p1(i int) insert into t1 values (j);
update comments set t='the previous statement is what I need';
This approach does possibly introduce some overhead.

Or I can add a comment clause in the statement.
create procedure p1(i int) comment="this is what I need" insert into t1 values (j);
This approach requires parsing the SQL, and dealing with artistic indentation and usage of other options in the query. And if I need to deal with commands that don't support the "comment" option, I am back to square one.

Advice wanted

So far, the only method that works almost always is the blackhole trick (1)
I would like to know if there is any method of introducing a piece of information related to a given statement, in such a way that the comment survives after one of the following:
  • The binary log is converted to queries and passed to a MySQL client that applies the stream of queries to another server.
  • The binary log is associated with another master, and then passed to a slave through regular replication.
In both cases, I need to find the original information in the last server's binary log.

As a related matter, I know that MySQL, in regular replication, passes some information across binary logs, and that information is the server-id. If I set an intermediate server as relay slave, the server-id of the original master is associated with the query recorder in the binary log of every slave. I don't know if I can use this information for my purposes, but I would like to know how does the replication process maintain the server ID across servers.

Maybe it's too late for me and I can't see an obvious solution. I will appreciate any suggestion. Thanks in advance

(1) If the blackhole is disabled, the method fails, or introduce unacceptable overhead.

Friday, November 12, 2010

Filtering binary logs with MySQL Sandbox and replication

A few days ago, a friend of mine asked me if I knew of a way of filtering a bunch of binary logs, to extract only statements related to a single table. The task was about filtering a few hundred binary log files.

It's a tricky problem. Even with my experience with regular expressions, I knew that using a script to extract statements related to a single table was going to be a nasty business.
However, I know of an entity that can filter by table name efficiently, and that's the MySQL replication system. So I suggested using replication to a sandbox with a replicate-wild-do-table statement to get the job done.
My friend was skeptical and did not want to go that way. I was busy writing an article for an Italian magazine and did not follow up immediately. But today, with the article safely in the editor's hands, I did a quick test, and guess what? It works!
binary log filter
Here is a step-by-step procedure to do it. I started with a server built with MySQL Sandbox, using MySQL 5.5.7. I used the employees test database to create a large enough binlog, and soon I had a database containing 160 MB of data and a binary log of about the same size.
Then I decided that I wanted to filter the binlog, to get only statements about the employees table. Thus, I issued this command:

$ make_sandbox 5.5.7 --sandbox_port=6000 \
  --sandbox_directory=trans_repl \
  -c log-slave-update \
  -c replicate-wild-do-table=employees.employees \
  -c log-bin=mysql-bin \
  -c server-id=101
The "-c" option transfers its argument to the sandbox configuration file.
At the end of this operation, I had one server with the same version of the server that I had filled with the employee database. The server is ready to filter replicated streams, accepting only commands that affect the table 'employees' within the database 'employees'.

The second step was to create an empty database in the second server, with the Innodb tables converted to BlackHole (to avoid wasting unnecessary space).

Inside the first sandbox, I did this:

$ ./my sqldump -B --no-data employees \
  | perl -pe 's/=innodb/=blackhole/i' \
  | ~/sandboxes/trans_repl/use
Combining the flexibility of the sandbox with some command line skills, the operation requires just one command.
Before starting the replication, I needed to avoid re-creating the tables, or my blackhole trick would have been useless. So I looked at the binary log, and found where the CREATE TABLE statements ended:

$ ./my sqlbinlog ./data/mysql-bin.000001 |less
BEGIN
/*!*/;
# at 3057
#101112 9:48:45 server id 1 end_log_pos 3364 Query thread_id=1 exec_time=0 error_code=0
SET TIMESTAMP=1289551725/*!*/;
INSERT INTO `departments` VALUES ('d001','Marketing'),('d002','Finance'),('d003','Human Resources'),('d004','Production'),('d005','Development'),('d006','Quality Management'),('d007','Sales'),('d008','Research'),('d009','Customer Service')/*!*/;

Armed with this knowledge, I logged in the second server and did the following:

FLUSH BINARY LOGS; # to get a binlog with only the statements that I need

CHANGE MASTER TO
master_host='127.0.0.1',
master_port=5570,
master_user='msandbox',
master_password='msandbox',
master_log_file='mysql-bin.000001',
master_log_pos=3057;    # this is the position after all the CREATE TABLE
                        # statements in the master
START SLAVE;
After a few seconds, I issued a "SHOW SLAVE STATUS". All was OK.
I flushed the logs again and inspected the second binary log. As expected, it contained only the statements related to the employees table.
Total cost of the operation: 5 minutes. Way quicker than writing this report!

Monday, April 27, 2009

MySQL 5.4 performance with logging


MySQL 5.4
About a month ago, I published the results of MySQL 5.x performance with logging. The results covered several versions, from 5.0.45 to 5.1.33. Among the conclusions of the post was the consideration that MySQL 5.0.x is faster than MySQL 5.1 in read only operations. I hinted that better results may come for MySQL 5.1. When I wrote that post I had, in fact, an ace up my sleeve, because I had already benchmarked the performance of MySQL 5.4, using the same criteria shown in my previous post. The results, as you can see from the charts below, tell that you don't need to downgrade to 5.0 to achieve the performance you want, but MySQL 5.4 may be the right choice.

MySQL 5.1 + 5.4 read-only logging performance

MySQL 5.1 + 5.4 read-only logging performance

MySQL 5.0 read-only Logging performance

MySQL 5.0 read-only Logging performance

MySQL 5.1 + 5.4 binlog performance

MySQL 5.1 + 5.4 binlog performance

MySQL 5.0 binlog performance

MySQL 5.0 binlog performance

A piece of trivia. On April 9, users of MySQL Sandbox may have noticed that I left a clue of what was about to happen. Among the accepted versions, you could see "5.3, 5.4. 5.5" :-)

Wednesday, March 25, 2009

MySQL 5.x performance with logging

There has been much talking about MySQL performance related to logging. Since MySQL 5.1.21, when Bug #30414 was reported (Slowdown (related to logging) in 5.1.21 vs. 5.1.20) I have been monitoring the performance of the server, both on 5.0 and 5.1.
Recently, I got a very powerful server, which makes these measurements meaningful.
Thus, I measured the performance of the server, using all publicly available sources, because I want this benchmark to be repeatable by everyone.
I will first describe the method used for the benchmarks, and then I report the results.

The server

The server is a Linux Red Hat Enterprise 5.2, running on a 8core processor, with 32 GB RAM and 1.5 TB storage.

$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.2 (Tikanga)

$ cat /proc/cpuinfo |grep "processor\|model name" | sort |uniq
model name : Intel(R) Xeon(R) CPU E5450 @ 3.00GHz
processor : 0
processor : 1
processor : 2
processor : 3
processor : 4
processor : 5
processor : 6
processor : 7

cat /proc/meminfo
MemTotal: 32967056 kB
MemFree: 22790272 kB

Method

I downloaded the source code tarball, as released for every version of MySQL, from 5.0.45 to 5.0.77 and from 5.1.20 to 5.1.33. I also took the code for 5.0.79 from the Bazaar tree on Launchpad.
For each version, I did the following:
* expanded the tarball to a directory;
* compiled the code using ./BUILD/compile-pentium64-max;
* built a binary using ./source/make_binary_distribution;
* installed the binary using MySQL Sandbox, assigning 5GB of RAM to innodb_buffer_pool_size;
* Tun the sysbench OLTP transactional test, 8 threads, 1,000,000 records, 1 minute run;

After installing, all database instances where shut down, and only one database server was active at any given time during the tests, to have clean results.
The servers were cleaned up, with all databases and log files removed between tests.
Additionally, all operating system memory cache was erased between tests.

The sysbench commands used for the tests were the following:
sysbench --test=oltp --oltp-table-size=1000000 \
--mysql-db=test --mysql-user=msandbox \
--mysql-password=msandbox --mysql-host=127.0.0.1 \
--mysql-port=$PORT --num-threads=8 prepare

sysbench --test=oltp --oltp-table-size=1000000 \
--mysql-db=test --mysql-user=msandbox \
--mysql-password=msandbox --mysql-host=127.0.0.1 \
--mysql-port=$PORT --max-time=60 --oltp-read-only=on \
--max-requests=0 --num-threads=8 run

sysbench --test=oltp --oltp-table-size=1000000 \
--mysql-db=test --mysql-user=msandbox \
--mysql-password=msandbox --mysql-host=127.0.0.1 \
--mysql-port=$PORT --max-time=60 --oltp-read-only=off \
--max-requests=0 --num-threads=8 run

The not so exciting results first


Using Sysbench with --oltp-read-only=on, MySQL 5.0 outperforms MySQL 5.1 constantly. In a read-only situation, it looks like 5.0 is much better than 5.1.

read only 5.0

read-only 5.1
Additional bad news is that Bug #30414 is still unfixed. Table logging takes away 50% of performance and thus is not recommended. General logs on file, instead, with the ability of logging on demand, is an affordable diagnostic tool.
Partial good news can be deducted by the ability of MySQL 5.1 to perform better with file logging, compared to MySQL 5.0.

The good news


Normal operations in a MySQL server are not read only, and include a binary log. For my second batch of testing, then, I compared performance with --oltp-read-only=off with and without binary logs. In this situation, MySQL 5.1 outperforms MySQL 5.0 in all cases. It's also nice to observe that the performance is improving gradually from earlier versions to recent ones. Kudos to MySQL engineers for their constant dedication to the server improvement.

Read-write 5.0

read-write TPS 5.1
The even better news is that, in MySQL 5.1, performance gain on 5.0.x has been increasing steadily.

What does this tell you? That MySQL 5.1 is much, much better than 5.0 for online transaction processing, while MySQL 5.0 seems to be better at concurrent read-only operations. My immediate reaction is that I should use MySQL 5.1 for a master and 5.0 for slaves. However, there are two considerations that stop me. First, using a master of a higher version than the slaves is not recommended, although some tests that I made after this discovery show that you can actually do it, provided that you don't use any specific MySQL 5.1 features. And second, the improvement path is such, that I believe MySQL 5.1 is going to catch up on 5.0 for read-only operations. As soon as this happens, I will let you know immediately! Probably, we will know more at the Users Conference 2009.

Summing up

I think that MySQL 5.1 has much to offer in comparison to 5.0.
The tests that I have performed are not definite. Sysbench is just one of the possible tests, and it doesn't mean that your production server will follow the same pattern or have the same performance. It's just an independent way of measuring the server performance, and especially is a repeatable way, which everyone can reproduce.
I warmly invite you to repeat these tests on different machines and let me know your results.

Friday, September 12, 2008

Decoding binlog entries with row-based replication


rowing

If you have tried using row based replication, you may have noticed two things: (1) it fixes many inconsistencies of statement-based replication, but (2) the binlog is unfit for humans. Inspecting it after a problem occurs won't provide any useful information.
The instructions look like line noise, and when you don't see the result you were expecting you wonder if that's the case.

For example, after executing this code:
create table t1 (id int, c char(10), d date);
insert into t1 values (1, 'abc', '2008-01-01');
insert into t1 values (2, 'def', '2008-08-19');
insert into t1 values (3, 'ghi', current_date());
select * from t1;
+------+------+------------+
| id | c | d |
+------+------+------------+
| 1 | abc | 2008-01-01 |
| 2 | def | 2008-08-19 |
| 3 | ghi | 2008-09-12 |
+------+------+------------+

The binlog, as shown by mysqlbinlog, gives output like

# at 182
#080912 17:15:07 server id 1 end_log_pos 289 Query thread_id=5 exec_time=0 error_code=0
SET TIMESTAMP=1221232507/*!*/;
create table t1 (id int, c char(10), d date)
/*!*/;
# at 289
#080912 17:15:07 server id 1 end_log_pos 357 Query thread_id=5 exec_time=0 error_code=0
SET TIMESTAMP=1221232507/*!*/;
BEGIN
/*!*/;
# at 357
# at 402
#080912 17:15:07 server id 1 end_log_pos 402 Table_map: `test`.`t1` mapped to number 21
#080912 17:15:07 server id 1 end_log_pos 443 Write_rows: table id 21 flags: STMT_END_F

BINLOG '
e4fKSBMBAAAALQAAAJIBAAAAABUAAAAAAAAABHRlc3QAAnQxAAMD/goC/goH
e4fKSBcBAAAAKQAAALsBAAAQABUAAAAAAAEAA//4AQAAAANhYmMhsA8=
'/*!*/;
This is more difficult to read than ancient Etruscan. If you are a DBA, you curse and look for help.
But now it has changed.
Starting with MySQL 5.1.28 (available since September 13th), using the "--verbose" option, mysqlbinlog will add some human readable comments after the statement.

BINLOG '
e4fKSBMBAAAALQAAAJIBAAAAABUAAAAAAAAABHRlc3QAAnQxAAMD/goC/goH
e4fKSBcBAAAAKQAAALsBAAAQABUAAAAAAAEAA//4AQAAAANhYmMhsA8=
'/*!*/;
### INSERT INTO test.t1
### SET
### @1=1
### @2='abc'
### @3='2008:01:01'
# at 443
The manual has more information on this topic.
It is a much needed addition. Our support department insisted on having this feature developed before GA, and with reason. Assisting customers who have replication problems without being able to decode the binlog output is quite a hard task.
Jan Kneschke, the author of MySQL Proxy, showed the way with this proof of concept, and now the feature is out. Good job, all the developers involved!