The Data Charmer: Advanced replication for the masses - Part III

After part I: the basics, and part II: parallel apply, we deal now with some more mundane topic, or how to deploy replication services in a way that they fit our business, covering from the basic master/slave to the most advanced multi-source scheme.

Master/slave

The most common topology is master/slave. One master, many slaves. This topology is equivalent to MySQL native replication. The differences are in the additional features. Tungsten supports seamless failover and parallel replication in all topologies.

Figure 1. Tungsten master/slave replication
Unlike MySQL, and unlike previous versions of Tungsten, the implementation of this topology uses a dedicated service for the master, and deploys a corresponding service on each slave. In MySQL, and old Tungsten, there is just a general purpose pipeline that connects from master to slave. In Tungsten Replicator, instead, you need to define a service explicitly. While this requirement looks like overkill for such a simple topology, it will be clear that it helps defining advanced topologies.

Relayed replication

The first time that I saw this topology with Tungsten, it was by mistake. I wanted to connect host Delta with host Alpha, but by a combination of bad cut-and-paste and mindlessness, I connected the slave Delta to a seemingly non-existent master Bravo (figure 2).

Figure 2. Tungsten master/slave replication with relay
The funny thing is that I did not realize the error, because the test that I wrote to certify that what I inserted in the master was then usable in all the slaves, worked without a problem. Thus I learned another feature of Tungsten: every slave is also a relay slave, without need of any additional setup. It does not matter if the slave is using binary logs, or if it has enabled the logs-slave-updates option. Those requirements are for MySQL native replication. Tungsten slave replicator services can detect a request from another slave, and act as a relay slave if necessary.

Bi-directional replication

Here is where people used to MySQL replication start to be surprised. To set a master-to-master topology, we deploy two services for each host. The first host sets a master service named alpha, and a remote slave service named bravo. The second host does the opposite: a local master service named bravo, and a remote slave service named alpha. (Figure 3)

Figure 3. Tungsten bi-directional replication
Whatever is updated on host alpha is processed and sent to remote slaves by the master replicator service alpha. Whatever is updated in host bravo is processed by the master replicator service bravo and dispatched to its audience.
This system works like MySQL master-to-master replication. There is no conflict resolution handling. The prerequisite, for this and the following topologies, is that you know what you are doing and leverage the replication system within its limitations.

Multiple site replication

This is an extension of bi-directional replication. For each master, there is one or more slaves. Figure 4 shows a typical implementation. You have a main site in one location, and want to have a backup system in another location. If disaster strikes in your main location, you are ready to switch over to the alternate site with minimum delay.

Figure 4. Tungsten multiple site replication
The reliability and robustness of this scheme depends on your applications. Nothing prevents you from writing to both masters at once. And if you keep the tasks logically separated (e.g. master alpha writes on database products, while master bravo writes on database employees) nothing bad will happen. But if your applications update the same records, you can suffer either a duplicate key error or silent loss of changes when a master writes on top of an already updated record.
This topology is frequently used in combination with a virtual IP provider, a system that shields both clusters from the application's view, and let it connect through an IP that is associated to the active master. There are many such systems, including Linux HA and Tungsten Enterprise. But that is beyond the scope of this post.

Multiple source replication

This topology is extremely popular, at least judging from the many times that it has been requested to me. So strangely popular, in fact, because it does not exist, at least as far as MySQL replication is concerned. One of the limits of MySQL native replication is that every slave can have only one master.
This topology has been explained to me in many ways. One of the most common is this. You have a company that has a headquarters in a town, say New York. That company has stores in many cities, and the management would like to get the data from each store to the headquarters, in real time. This is the opposite of the master/slave topology, and no matter how creative you become, you can't get MySQL to do it.
Using Tungsten, you can implement this topology fairly easily. (Figure 5)

Figure 5. Tungsten multiple source replication
Each server deploys a master replicator service. The host in headquarters deploys one slave service for each of the remote masters.
Of course, there is no conflict resolution handling. If you r remote masters don't behave within the limits that you want them to have, you will get in trouble, and replication may stop. However, if you know what you're doing and set the system properly, at least you can achieve the goal of getting this reverse replication scheme working smoothly.

Hands on

To help the early adopters, I have put together a set of scripts that deploy easily any of the topologies mentioned in this post with just a few actions.
What you need is 4 hosts (four copies of the virtual machine mentioned in part I will do), and the Tungsten deployer package that you will get from the Tungsten downloads page.
Once you have the four server, unpack the deployer scripts on a chosen directory in all servers, making sure that the directory is the same in all four. Inside the package, there is a README file with a detailed list of instructions. So detailed, in fact, that I won't copy it here because it would make this post too long.

I will just show a sample run:

$ ./set_multi_source.sh 
QA1
installation OK

NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 1.243
role            : master
serviceName     : alpha
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

QA2
installation OK

NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 1.264
role            : master
serviceName     : bravo
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

QA3
installation OK

NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 0.812
role            : master
serviceName     : charlie
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

QA4
installation OK

Processing services command...
NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 29.521
role            : slave
serviceName     : alpha
serviceType     : local
started         : true
state           : ONLINE
NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 20.123
role            : slave
serviceName     : bravo
serviceType     : remote
started         : true
state           : ONLINE
NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 12.726
role            : slave
serviceName     : charlie
serviceType     : remote
started         : true
state           : ONLINE
Finished services command...

$ ./test_flow_multi_source.sh 
inserting 100 into each of the three masters. Please wait
Retrieving data from the masters
QA1
+----+-----+------+
| t  | c   | s    |
+----+-----+------+
| t1 | 100 | 5050 |
+----+-----+------+

QA2
+----+-----+------+
| t  | c   | s    |
+----+-----+------+
| t2 | 100 | 5050 |
+----+-----+------+

QA3
+----+-----+------+
| t  | c   | s    |
+----+-----+------+
| t3 | 100 | 5050 |
+----+-----+------+

Retrieving data from the slave
QA4
+----+-----+------+
| t  | c   | s    |
+----+-----+------+
| t1 | 100 | 5050 |
| t2 | 100 | 5050 |
| t3 | 100 | 5050 |
+----+-----+------+