As you probably know, Tungsten Replicator can replicate data from MySQL to MongoDB. The installation is relatively simple and, once done, replication works very well. There was a bug in the installation procedure recently, and as I was testing that the breakage has been fixed, I wanted to share the experience of getting started with this replication.
Step 1: install a MySQL server
For this exercise, we will use a MySQL sandbox running MySQL 5.5.31.
We download the binaries from dev.mysql.com and install a sandbox, making sure that it is configured as master, and that it is used row-based-replication.
$ mkdir -p $HOME/opt/mysql $ cd ~/downloads $ wget http://dev.mysql.com/get/Downloads/MySQL-5.5/mysql-5.5.31-linux2.6-x86_64.tar.gz/from/http://cdn.mysql.com/ $ make_sandbox --export_binaries ~/downloads/mysql-5.5.31-linux2.6-x86_64.tar.gz -- --master $ echo 'binlog-format=row' >> ~/sandboxes/msb_5_5_31/my.sandbox.cnf $ ~/sandboxes/msb_5_5_31/use -u root Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 1 Server version: 5.5.31-log MySQL Community Server (GPL) Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql [localhost] {root} ((none)) > grant all on *.* to tungsten identified by 'secret' with grant option; Query OK, 0 rows affected (0.00 sec) mysql [localhost] {root} ((none)) > set global binlog_format=row; Query OK, 0 rows affected (0.00 sec)
The above command will install an instance of MySQL 5.5.31 in the directory $HOME/sandboxes/msb_5_5_31. You can use any other MySQL version. In that case, you should change the DEPLOY.sh file below with the coordinates of your server.
Step2: install a MongoDB server
Get the binaries from MongoDB downloads, and unpack them.
$ cd ~/downloads $ wget http://fastdl.mongodb.org/linux/mongodb-linux-x86_64-2.4.3.tgz $ cd $ tar -xzf ~/downloads/mongodb-linux-x86_64-2.4.3.tgz $ mv mongodb-linux-x86_64-2.4.3 mongodb
Once you have unpacked the binaries, you can run the server, with a command such as this:
#!/bin/bash MONGODB_HOME=$HOME/mongodb if [ ! -d $MONGODB_HOME ] then echo "$MONGODB_HOME not found" exit 1 fi cd $MONGODB_HOME if [ ! -d $MONGODB_HOME/data ] then mkdir $MONGODB_HOME/data fi ./bin/mongod \ --logpath=$MONGODB_HOME/mongodb.log \ --dbpath=$MONGODB_HOME/data \ --fork \ --rest
Now MongoDB should be ready to receive commands. If needed, you can see a tutorial on basic MongoDB operations. For now, it's enough to check that your system is running:
$ cd ~/mongodb $ ./bin/mongo MongoDB shell version: 2.4.3 connecting to: test > show dbs local 0.078125GB >
Step 3: install a master replicator using MySQL
As I mentioned above, there was a bug in the installation. The bug was fixed yesterday, and thus you should use a recent build from http://bit.ly/tr20_builds. For this exercise, I am using tungsten-replicator-2.1.0-269.tar.gz.
Of course, you should also configure your host according to the system requirements for Tungsten.
We start by creating a few directories that we need for our deployment.
$ cd $ mkdir deploy $ cd deploy $ mkdir mysql # here we install the master replicator $ mkdir mongodb # here we install the slave replicator $ wget https://s3.amazonaws.com/files.continuent.com/builds/nightly/tungsten-2.0-snapshots/tungsten-replicator-2.1.0-269.tar.gz $ tar -xzf tungsten-replicator-2.1.0-269.tar.gz
We now create a defaults script that will be used by both master and slave installer.
$ cat DEPLOY.sh MYSQL_PORT=5531 MYSQL_SANDBOX_BASE=$HOME/sandboxes/msb_5_5_31 MYSQL_BASEDIR=$HOME/opt/mysql/5.5.31 MYSQL_CONF=$MYSQL_SANDBOX_BASE/my.sandbox.cnf MYSQL_BINLOG_DIRECTORY=$MYSQL_SANDBOX_BASE/DATA/ DEPLOY_HOME=$HOME/deploy MYSQL_DEPLOY=$DEPLOY_HOME/mysql MONGODB_DEPLOY=$DEPLOY_HOME/mongodb MONGODB_PORT=27017 TUNGSTEN_BINARIES=$DEPLOY_HOME/tungsten-replicator-2.1.0-269 MASTER_THL_PORT=12500 SLAVE_THL_PORT=12600 MASTER_RMI_PORT=11500 SLAVE_RMI_PORT=11600
And finally the master installation script:
$ cat install_master.sh . ./DEPLOY.sh cd $TUNGSTEN_BINARIES export PATH=$MYSQL_BASEDIR/bin:$PATH ./tools/tungsten-installer --master-slave -a \ --datasource-type=mysql \ --master-host=127.0.0.1 \ --datasource-user=tungsten \ --datasource-password=secret \ --datasource-mysql-conf=$MYSQL_CONF \ --datasource-log-directory=$MYSQL_BINLOG_DIRECTORY \ --datasource-port=$MYSQL_PORT \ --service-name=mongodb \ --home-directory=$MYSQL_DEPLOY \ --cluster-hosts=127.0.0.1 \ --thl-port=$MASTER_THL_PORT \ --rmi-port=$MASTER_RMI_PORT \ --java-file-encoding=UTF8 \ --mysql-use-bytes-for-string=false \ --mysql-enable-enumtostring=true \ --mysql-enable-settostring=true \ --svc-extractor-filters=colnames,pkey \ --svc-parallelization-type=none --start-and-report
We can run the installation command for the master.
$ ./install_master.sh INFO >> 127_0_0_1 >> Getting services list INFO >> 127_0_0_1 >> .. Processing services command... NAME VALUE ---- ----- appliedLastSeqno: 0 appliedLatency : 1.218 role : master serviceName : mongodb serviceType : local started : true state : ONLINE Finished services command...
Step 4: install a slave replicator using MongoDB
Now we have a master database server with a replicator that can send data across for any slave to pick it up. We can install one or more MySQL slaves, but this is not the purpose of this exercise. So we skip any regular slave installation and will install a MongoDB slave. To do so, we create a third script, which will invoke the installer to run a replicator slave service that applies data to MongoDB.
$ cat install_slave.sh . ./DEPLOY.sh cd $TUNGSTEN_BINARIES export PATH=$MYSQL_BASEDIR/bin:$PATH tools/tungsten-installer --master-slave -a \ --datasource-type=mongodb \ --master-host=127.0.0.1 \ --service-name=mongodb \ --home-directory=$MONGODB_DEPLOY \ --cluster-hosts=127.0.0.1 \ --datasource-port=$MONGODB_PORT \ --master-thl-port=$MASTER_THL_PORT \ --thl-port=$SLAVE_THL_PORT \ --rmi-port=$SLAVE_RMI_PORT \ --java-file-encoding=UTF8 \ --skip-validation-check=InstallerMasterSlaveCheck \ --svc-parallelization-type=none --start-and-report
Compared to MySQL installation, we see that the 'datasource-type' is 'mongodb', and as such it does not need username and password. Let's run it.
$ ./install_slave.sh WARN >> 127.0.0.1 >> Currently unable to check for the THL schema in mongodb INFO >> 127_0_0_1 >> Getting services list INFO >> 127_0_0_1 >> Processing services command... NAME VALUE ---- ----- appliedLastSeqno: 0 appliedLatency : 49.444 role : slave serviceName : mongodb serviceType : local started : true state : ONLINE Finished services command...
We get a warning, which should not worry us, as it is expected. Notice that the high latency reported by the replicator is due to the delay in installing the slave. The only event in the replication pipeline is the 'master online' broadcast, which was replicated after the installation was completed.
If you got to this point, with both the master and slave replicator in the 'ONLINE' state, then you can continue with checking that replication works.
Step 5: check replication
Replication from MySQL (a relational DBMS) to MongoDB (a key-value store) requires some understanding of what we are replicating and what limitations we can face.
In MySQL, we have tables, which are translated to MongoDB collections. What is very important to understand is that Tungsten Replicator DOES NOT REPLICATE DDL EVENTS. Any CREATE/DROP/ALTER events are ignored by the replicator.
However, the replicator is smart enough to convert every table record into a document within the relevant collection. Every insert will generate a document inside MongoDB. Let's see an example:
mysql [localhost] {msandbox} (test) > create table myfirst (id int not null primary key, name char(30), d date); Query OK, 0 rows affected (0.00 sec) mysql [localhost] {msandbox} (test) > insert into myfirst values (1, 'Harry Potter', '1997-06-30'); Query OK, 1 row affected (0.00 sec)
On the MongoDB side, we will see the corresponding object:
> show dbs local 0.078125GB test 0.203125GB tungsten_mongodb 0.203125GB > use test switched to db test > show collections myfirst system.indexes > db.myfirst.find() { "_id" : ObjectId("51a2867d3004fd7959a5f5aa"), "id" : "1", "name" : "Harry Potter", "d" : "1997-06-30" }
Not only inserts, but also updates are recognised:
mysql [localhost] {msandbox} (test) > insert into myfirst values (2, 'Harry Potter 2', '1999-06-02'); Query OK, 1 row affected (0.00 sec) mysql [localhost] {msandbox} (test) > update myfirst set name = 'Harry Potter 1' where id =1; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0
Which will result in:
> db.myfirst.find() { "_id" : ObjectId("51a2867d3004fd7959a5f5aa"), "id" : "1", "name" : "Harry Potter 1", "d" : "1997-06-30" } { "_id" : ObjectId("51a287633004fd7959a5f5ab"), "id" : "2", "name" : "Harry Potter 2", "d" : "1999-06-02" }
And delete statements:
mysql [localhost] {msandbox} (test) > delete from myfirst where id =2; Query OK, 1 row affected (0.00 sec) ##### > db.myfirst.find() { "_id" : ObjectId("51a2867d3004fd7959a5f5aa"), "id" : "1", "name" : "Harry Potter 1", "d" : "1997-06-30" }
This is all fine. But what happens when we insert quite a lot of data? Let's find out. We're going to use the test employees database to see if the system complains.
$ cd ~/data/employees/ $ ~/sandboxes/msb_5_5_31/use < employees.sql INFO CREATING DATABASE STRUCTURE INFO storage engine: InnoDB INFO LOADING departments INFO LOADING employees INFO LOADING dept_emp INFO LOADING dept_manager INFO LOADING titles INFO LOADING salaries
To see if replication is OK, we check both master and slave for their status, and then we count the records in MySQL and MongoDB.
$ ./mongodb/tungsten/tungsten-replicator/bin/trepctl -port $MASTER_RMI_PORT services Processing services command... NAME VALUE ---- ----- appliedLastSeqno: 182 appliedLatency : 1.152 role : master serviceName : mongodb serviceType : local started : true state : ONLINE Finished services command... $ ./mongodb/tungsten/tungsten-replicator/bin/trepctl -port $SLAVE_RMI_PORT services Processing services command... NAME VALUE ---- ----- appliedLastSeqno: 106 appliedLatency : 93.469 role : slave serviceName : mongodb serviceType : local started : true state : ONLINE Finished services command...
So, we see that the slave is lagging, as the data is hard to digest (about 4 million records). Let's try again after a few seconds:
$ ./mongodb/tungsten/tungsten-replicator/bin/trepctl -port $SLAVE_RMI_PORT services Processing services command... NAME VALUE ---- ----- appliedLastSeqno: 182 appliedLatency : 89.99 role : slave serviceName : mongodb serviceType : local started : true state : ONLINE Finished services command...
Both the master and the slave have reached the same transaction number. The slave has some delay, due to the size of the transactions (which become even bigger because of the RBR). When the size of the transaction is small, the latency becomes quite acceptable:
$ ./mongodb/tungsten/tungsten-replicator/bin/trepctl -port $MASTER_RMI_PORT heartbeat $ ./mongodb/tungsten/tungsten-replicator/bin/trepctl -port $SLAVE_RMI_PORT services Processing services command... NAME VALUE ---- ----- appliedLastSeqno: 183 appliedLatency : 0.423 role : slave serviceName : mongodb serviceType : local started : true state : ONLINE Finished services command...
As a last operation, let's check the numbers of the biggest table in the employees database:
$ ~/sandboxes/msb_5_5_31/use employees Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 34 Server version: 5.5.31-log MySQL Community Server (GPL) Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql [localhost] {msandbox} (employees) > show tables; +---------------------+ | Tables_in_employees | +---------------------+ | departments | | dept_emp | | dept_manager | | employees | | salaries | | titles | +---------------------+ 6 rows in set (0.00 sec) mysql [localhost] {msandbox} (employees) > select count(*) from employees; +----------+ | count(*) | +----------+ | 300024 | +----------+ 1 row in set (0.22 sec) mysql [localhost] {msandbox} (employees) > select count(*) from dept_emp; +----------+ | count(*) | +----------+ | 331603 | +----------+ 1 row in set (0.28 sec) mysql [localhost] {msandbox} (employees) > select count(*) from titles; +----------+ | count(*) | +----------+ | 443308 | +----------+ 1 row in set (0.25 sec) mysql [localhost] {msandbox} (employees) > select count(*) from salaries; +----------+ | count(*) | +----------+ | 2844047 | +----------+ 1 row in set (0.83 sec) #### > use employees switched to db employees > show collections departments dept_emp dept_manager employees salaries system.indexes titles > db.employees.count() 300024 > db.dept_emp.count() 331603 > db.titles.count() 443308 > db.salaries.count() 2844047 >
This proves that replication works, and even large amount of data are replicated reasonably fast.
Here's the official documentation about MySQL to MongoDB replication: https://docs.continuent.com/wiki/display/TEDOC/Replicating+from+MySQL+to+MongoDB. There you will find more details on how to install and what to expect. Happy hacking!