A few years ago, I used MySQL Sandbox to filter binary logs. While that one was a theoretical case, recently I came across a very practical case where we needed to provision an Oracle database, which is the designated slave of a MySQL master.
In this particular case, we needed to provision the Oracle slave with some auditing features in place. Therefore, mass load functions were not considered. What we needed was the contents of the MySQL database as binary rows i.e. the same format used for row-based replication.
To achieve the purpose as quickly and as painlessly as we could, I thought to employ the services of a MySQL Sandbox. The flow of information would be like this:
- Install a sandbox, using the same version and character set of the master;
- Set the sandbox default binary log format as ROW;
- take a backup with mysqldump (with no data)
- filter the CREATE TABLE statements, replacing INNODB with BlackHole. This pass is important to speed up insert operations and to minimise storage needs. Using this trick, we only needed storage for the binary logs, not the data itself.
- Load the altered schema into the sandbox
- Take a full data dump without creation info
- Load the data into the sandbox
- Start replicating from the sandbox to the Oracle slave (using Tungsten Replicator)
- When the provisioning is over, point the replicator to the original master, to the point where we finished dumping data;
- Start replicating live from MySQL to Oracle.
Despite the many steps, the plan worked quite smoothly. We managed to extract data from three databases (and about 600 tables) for a total of 0.5 TB, in about 7 hours. The task was completed using a tool developed by my colleague Linas, which does an automated DDL translation from MySQL to Oracle. Probably this tool deserves an article of its own.