The Data Charmer: topology

Showing posts with label topology. Show all posts

Wednesday, August 19, 2015

MySQL replication in action - Part 4 - star and hybrid topologies

Previous episodes:

MySQL replication monitoring 101

MySQL replication in action - Part 1: GTID & Co

MySQL replication in action - Part 2 - Fan-in topology

MySQL replication in action - Part 3 - All-masters P2P topology

Introducing star topology.

In all-masters P2P topologies, we have seen that we have a way of deploying a topology where all nodes are masters, and achieve better efficiency and stability than ring topologies. That method comes at the price of a complex setup, which requires, for a N-node cluster, N*(N-1) connections.
We can achieve the same result as in a P2P all-masters topology by trading connections for stability. In a star topology (Figure 1) all nodes are masters, but they do not connect to each other directly. There is a special node, named hub, which receives the changes produced by each endpoint and spreads them to the others.
Topologies star

Figure 1 - A star topology

MySQL replication in action - Part 3: all-masters P2P topology

Previous episodes:

MySQL replication monitoring 101

MySQL replication in action - Part 1: GTID & Co

MySQL replication in action - Part 2 - Fan-in topology

In the previous article, we saw the basics of establishing replication from multiple origins to the same destination. By extending that concept, we can deploy more complex topologies, such as the point-to-point (P2P) all-masters topology, a robust and fast way of moving data.

Introduction to P2P all-masters topology

A P2P (Point-to-point) topology is a kind of deployment where replication happens in a single step from the producer to the consumers. For example, in a master/slave topology, replication from the master (producer) reaches every slave (consumer) in one step. This is simple P2P replication. If we use a hierarchical deployment, where every slave that is connected to the master is also replicating to one or more slaves, we will have a 2-step replication (Figure 1). Similarly, in circular replication, we have as many steps as the number of nodes minus one (Figure 2.)
Hierarchical master slave processing

Figure 1 - Hierarchical replication depth of processing

MySQL replication in action - Part 2 - Fan-in topology

Introduction: where we stand

Previous episodes:

MySQL replication monitoring 101

MySQL replication in action - Part 1: GTID & Co

In the latest releases of MySQL and MariaDB we have seen several replication improvements. One of the most exciting additions is the ability to enhance basic replication with multiple sources. Those who have used replication for a while should remember that one of the tenets of the “old” replication was that a slave couldn’t have more than one master. This was The Law and there was no escape ... until now. The only way to work around that prohibition was to use circular replication, also known as ring replication, where each node is slave of the previous node and master of the next one.
Circular replication

tpm, the multi-master composer

Multi master topologies blues

Tungsten Replicator is a powerful replication engine that, in addition to providing the same features as MySQL Replication, can also create several topologies, such as

all-masters: every master in the deployment is a master, and all nodes are connected point-to-point, so that there is no single point of failure (SPOF).
fan-in: Several masters can replicate into a single slave;
star: It’s an all-masters topology, where one node acts as hub which simplifies the deployment at the price of creating a SPOF.

The real weakness of these topologies is that they don’t come together easily. Installation requires several commands, and running them unassisted is a daunting task. Some time ago, we introduced a set of scripts (the Tungsten Cookbook) that allow you to install multi-master topologies with a single command. Of course, the single command is just a shell script that creates and runs all the commands needed for the deployment. The real downer is the installation time. For an all-masters topology with 4 nodes, you need 17 operations, which require a total of about 8 minutes. Until today, we have complex operations, and quite slow.

Meet The TPM

Notice: these examples require a recent night build of Tungsten Replicator (e.g. 2.1.1-120), which you can download from http://bit.ly/tr_21_builds

But technology advances. The current tungsten-installer, the tool that installs Tungsten-Replicator instances, has evolved into a tool that has been used for long time to install our flagship product, Continuent Tungsten (formerly known as ‘Tungsten Enterprise’). The ‘tpm’ (Tungsten Package Manager) has outsmarted its name, as it does way more than managing packages, and actually provides a first class installation experience. Among other things, it provides hundreds of validation checks, to make sure that the operating system, the network, and the database servers are fit for the installation. Not only that, but it installs all components, in all servers in parallel.

So users of our commercial solution have been enjoying this more advanced installation method for quite a long time, and the tpm itself has improved its features, becoming able to install single Tungsten Replicator instances, in addition to the more complex HA clusters. Looking at the tool a few weeks ago, we realized that tpm is so advanced that it could easily support Tungsten Replicator topologies with minimal additions. And eventually, we have it!

The latest nightly builds of Tungsten Replicator include the ability of installing multi-master topologies using tpm. Now, not only you can perform these installation tasks using the cookbook recipes, but the commands are so easy that you can actually run them without help from shell scripts.

Let’s start with the plain master/slave installation (Listing 1). The command looks similar to the one using tungsten-installer. The syntax has been simplified a bit. We say members instead of cluster-hosts, master instead of master-host, replication-user and replication-password instead of datasource-user and datasource-password. And looking at this command, it does not seem worth the effort to use a new syntax just to save a few keystrokes.

./tools/tpm install alpha \
    --topology=master-slave \
    --home-directory=/opt/continuent/replicator \
    --replication-user=tungsten \
    --replication-password=secret \
    --master=host1 \
    --slaves=host2,host3,host4 \
    --start

Listing 1: master/slave installation.

However, the real bargain starts appearing when we compare the installation time. Even for this fairly simple installation, which ran in less than 2 minutes with tungsten-installer, we get a significant gain. The installation now runs in about 30 seconds.

Tpm master slave — *Image 1 - Master/slave deployment*

Where we see the most important advantages, though, is when we want to run multiple masters deployments. The all-masters installation command, lasting 8 minutes, which I mentioned a few paragraphs above? Using tpm, now runs in 45 seconds, and it is one command only. Let’s have a look

./tools/tpm install four_musketeers \
    --topology=all-masters \
    --home-directory=/opt/continuent/replicator \
    --replication-user=tungsten \
    --replication-password=secret \
    --masters=host1,host2,host3,host4 \
    --master-services=alpha,bravo,charlie,delta \
    --start

Listing 2: all-masters installation.

It’s worth observing this new compact command line by line:

./tools/tpm install four_musketeers: This command calls tpm with the ‘install’ mode, to the entity ‘four_musketeers’. This thing is a data service, which users of other Tungsten products and readers of Robert Hodges blog will recognize as a more precise definition of what we commonly refer to as ‘a cluster.’ Anyway, this data service appears in the installation and, so far, does not have much to say within the replicator usage. So just acknowledge that you can name this entity as you wish, and it does not affect much of the following tasks.
–topology=all-masters: Some of the inner working of the installer depend on this directive, which tells the tpm what kind of topology to expect. If you remember what we needed to do with tungsten-installer + configure-service, you will have some ideas of what this directive tells tpm to do and what you are spared now.
–home-directory=/opt/continuent/replicator: Nothing fancy here. This is the place where we want to install Tungsten.
–replication-user=tungsten: It’s the database user that will take care of the replication.
–replication-password=secret: The password for the above user;
–masters=host1,host2,host3,host4: This is the list of nodes where a master is deployed. In the case of an all-masters topology, there is no need of listing the slaves: by definition, every host will have a slave service for the remaining masters.
–master-services=alpha,bravo,charlie,delta: This is the list of service names that we will use for our topology. We can use any names we want, including the host names or the names of your favorite superheroes.
–start: with this, the replicator will start running immediately after the deployment.

This command produces, in 45 seconds, the same deployment that you get with tungsten-installer in about 8 minutes.

Tpm all masters — *Image 2 - all-masters deployment*

The command is so simple that you could use it without assistance. However, if you like the idea of Tungsten Cookbook assembling your commands and running them, giving you access to several commodity utilities in the process, you can do it right now. Besides, if you need to customize your installation with ports, custom paths and management tools, you will appreciate the help provided by Tungsten Cookbook.

# (edit ./cookbook/USER_VALUES.sh)
export USE_TPM=1
./cookbook/install_all_masters

Listing 3: invoking tpm installation for all-masters using a cookbook recipe.

When you define USE_TPM, the installation recipe will use tpm instead of tungsten-installer. Regardless of the verbosity that you have chosen, you realize that you are using the tpm because the installation is over very soon.

The above command (either the one done manually or the built-in recipe) will produce a data service with four nodes, all of which are masters, and you can visualize them as:

./cookbook/show_cluster
--------------------------------------------------------------------------------------
Topology: 'ALL_MASTERS'
--------------------------------------------------------------------------------------
# node host1
alpha    [master]   seqno:         15  - latency:   0.058 - ONLINE
bravo    [slave]    seqno:         15  - latency:   0.219 - ONLINE
charlie  [slave]    seqno:         15  - latency:   0.166 - ONLINE
delta    [slave]    seqno:         15  - latency:   1.161 - ONLINE

# node host2
alpha    [slave]    seqno:         15  - latency:   0.100 - ONLINE
bravo    [master]   seqno:         15  - latency:   0.179 - ONLINE
charlie  [slave]    seqno:         15  - latency:   0.179 - ONLINE
delta    [slave]    seqno:         15  - latency:   1.275 - ONLINE

# node host3
alpha    [slave]    seqno:         15  - latency:   0.093 - ONLINE
bravo    [slave]    seqno:         15  - latency:   0.245 - ONLINE
charlie  [master]   seqno:         15  - latency:   0.099 - ONLINE
delta    [slave]    seqno:         15  - latency:   1.198 - ONLINE

# node host4
alpha    [slave]    seqno:         15  - latency:   0.145 - ONLINE
bravo    [slave]    seqno:         15  - latency:   0.256 - ONLINE
charlie  [slave]    seqno:         15  - latency:   0.208 - ONLINE
delta    [master]   seqno:         15  - latency:   0.371 - ONLINE

Listing 4: The cluster overview after an all-masters installation.

More topologies: fan-in

Here is the command that installs three masters in host1,host2, and host3, all fanning in to host4, which will only have 3 slave services, and no master.

./tools/tpm install many_towns \
    --replication-user=tungsten \
    --replication-password=secret \
    --home-directory=/opt/continuent/replication \
    --masters=host1,host2,host3 \
    --slaves=host4 \
    --master-services=alpha,bravo,charlie \
    --topology=fan-in \
    --start

Listing 5: Installing a fan-in topology.

Tpm fan in 1 — *Image 3 - Fan-in deployment*

You will notice that it’s quite similar to the installation of all-masters. The most notable difference is that, in addition to the list of msters, the list of masters, there is also a list of slaves.

    --masters=host1,host2,host3 \
    --slaves=host4 \

Listing 6: How a fan-in topology is defined.

We have three masters, and one slave listed. We could modify the installation command this way, and we would have two fan-in slaves getting data from two masters.

    --masters=host1,host2 \
    --slaves=host3,host4 \
    #
    # The same as:
    #
    --masters=host1,host2 \
    --members=host1,host2,host3,host4 \

Listing 7: Reducing the number of masters increases the slaves in a fan-in topology.

Now we will have two masters in host1 and host2, and two fan-in slaves in host3 and host4.

Tpm fan in 2 — *Image 4 - Fan-in deployment with two slaves*

If we remove another master from the list, we will end up with a simple master/slave topology.

And a star

The most difficult topology is the star, where all nodes are masters and a node acts as a hub between each endpoint and the others.

./tools/tpm install constellation \
    --replication-user=tungsten \
    --replication-password=secret \
    --home-directory=/opt/continuent/replication \
    --masters=host1,host2,host4 \
    --hub=host3 \
    --hub-service=charlie \
    --master-services=alpha,bravo,delta \
    --topology=star \
    --start

Listing 8: Installing a star topology.

Now the only complication about this topology is that it requires two more parameters than all-masters or fan-in. We need to define which node is the hub, and how to name the hub service. But this topology has the same features of the one that you could get by running 11 commands with tungsten-installer + configure-service.

More TPM: building complex clusters

The one-command installation is just one of tpm many features. Its real power resides in its ability of composing more complex topologies. The ones shown above are complex, and since they are common there are one-command recipes that simplify their deployment. But there are cases when we want to deploy beyond these well known topologies, and compose our own cluster. For example, we want an all-masters topology with two additional simple slaves attached to two of the masters. To compose a custom topology, we can use tpm in stages. We configure the options that are common to the whole deployment, and then we shape up each component of the cluster.

#1
./tools/tpm configure defaults  \
    --reset \
    --replication-user=tungsten \
    --replication-password=secret \
    --home-directory=/home/tungsten/installs/cookbook \
    --start

#2
./tools/tpm configure four_musketeers  \
    --masters=host1,host2,host3,host4 \
    --master-services=alpha,bravo,charlie,delta \
    --topology=all-masters

#3
./tools/tpm configure charlie \
    --hosts=host3,host5 \
    --slaves=host5 \
    --master=host3

#4
./tools/tpm configure delta \
    --hosts=host4,host6 \
    --slaves=host6 \
    --master=host4

#5
./tools/tpm install

Listing 9: A composite tpm command.

In Listing 9, we have 5 tpm commands, all of which constitute a composite deployment order. In segment #1, we tell tpm the options that apply to all the next commands, so we won’t have to repeat them. In segment #2, we define the same 4 masters topology that we did in Listing 2. Segments #3 and #4 will create a slave service each on hosts host5 and host6, with the respective masters being in host3 and host4. The final segment #5 tells tpm to take all the information created with the previous command, and finally run the installation. You may be wondering how the tpm will keep track of all the commands, and recognize that they belong to the same deployment. What happens after every command is that the tpm adds information to a file named deploy.cfg, containing a JSON record of the configuration we are building. Since we may have previous attempts at deploying from the same place, we add the option –reset to our first command, thus making sure that we start a new topology, rather than adding to a previous one (which indeed we do when we want to update an existing data service).

The result is what you get in the following image:

Tpm all masters with slaves — *Image 6 - all-masters deployment with additional slaves*

A word of caution about the above topology. The slaves in host5 and host6 will only get the changes originated in their respective masters. Therefore, host5 will only get changes that were originated in host4, while host6 will only get changes from host4. If a change comes from host1 or host2, they will be propagated to host1 to host4, because each host has a dedicated communication link to each of the other masters, but the data does not pass through to the single slaves.

Different is the case when we add slave nodes to a star topology, as in the following example.

./tools/tpm configure defaults  \
    --reset \
    --replication-user=tungsten \
    --replication-password=secret \
    --home-directory=/home/tungsten/installs/cookbook \
    --start

./tools/tpm configure constellation  \
    --masters=host1,host2,host3,host4 \
    --master-services=alpha,bravo,delta \
    --hub=host3 \
    --hub-service=charlie \
    --topology=star

./tools/tpm configure charlie \
    --hosts=host3,host5 \
    --slaves=host5 \
    --master=host3

./tools/tpm configure delta \
    --hosts=host4,host6 \
    --slaves=host6 \
    --master=host4

./tools/tpm install

Tpm star with slaves — *Image 7 - star deployment with additional slaves*

In a star topology, the hub is a pass-through master. Everything that is applied to this node is saved to binary logs, and put back in circulation. In this extended topology, the slave service in host5 is attached to a spoke of the star. Thus, it will get only changes that were created in its master. Instead, the node in host6, which is attached to the hub master, will get all the changes coming from any node.

Extending clusters

So far, the biggest challenge when working with multi-master topologies has been extending an existing cluster. Starting with two nodes and then expanding it to three is quite a challenging task. (Figure 8)

Using tpm, though, the gask becomes quite easy. Let's revisit the all-masters installation command, similar to what we saw at the start of this article

./tools/tpm install musketeers \
    --reset \
    --topology=all-masters \
    --home-directory=/opt/continuent/replicator \
    --replication-user=tungsten \
    --replication-password=secret \
    --masters=host1,host2,host3 \
    --master-services=athos,porthos,aramis \
    --start

If we want to add a host 'host4', running a service called 'dartagnan', we just have to modify the above command slightly:

./tools/tpm configure musketeers \
    --reset \
    --topology=all-masters \
    --home-directory=/opt/continuent/replicator \
    --replication-user=tungsten \
    --replication-password=secret \
    --masters=host1,host2,host3,host4 \
    --master-services=athos,porthos,aramis,dartagnan \
    --start

./tools/tpm update

That's all it takes. The update command is almost a repetition of the install command, with the additional components. The same command also restarts the replicators, to get the configuration online.

Tpm all masters extend — *Image 8 - Extending an all-masters topology*

More is coming

The tpm is such a complex tool that exploring it all in one session may be daunting. In addition to installing, you can update the data service, and thanks to its precise syntax, you can deploy the change exactly in the spot where you want it, without moving from the staging directory. We will look at it with more examples soon.

Monday, April 22, 2013

Installing and administering Tungsten Replicator - Part 2 : advanced

Switching roles

To get a taste of the power of Tungsten Replicator, we will show how to switch roles. This is a controlled operation (as opposed to fail-over), where we can decide when to switch and which nodes are involved.

In our topology, host1 is the master, and we have three slaves. We can either ask for a switch and let the script select the first available slave, or tell the script which slave should be promoted. The script will show us the steps needed to perform the operation.

IMPORTANT! Please note that this operation is not risk free. Tungsten replicator is a simple replication system, not a complete management tool like Continuent Tungsten. WIth the replicator, you must make sure that the applications have stopped writing to the master before starting the switch, and then you should address the application to the new master when the operation is done.

$ cookbook/switch host2
# Determining current roles
host1 master
host2 slave
host3 slave
host4 slave
# Will promote host2 to be the new master server
# Waiting for slaves to catch up and pausing replication
trepctl -host host2 wait -applied 5382
trepctl -host host2 offline
trepctl -host host3 wait -applied 5382
trepctl -host host3 offline
trepctl -host host4 wait -applied 5382
trepctl -host host4 offline
trepctl -host host1 offline
# Reconfiguring server roles and restarting replication
trepctl -host host2 setrole -role master
trepctl -host host2 online
trepctl -host host1 setrole -role slave -uri thl://host2:2112
trepctl -host host1 online
trepctl -host host3 setrole -role slave -uri thl://host2:2112
trepctl -host host3 online
trepctl -host host4 setrole -role slave -uri thl://host2:2112
trepctl -host host4 online
--------------------------------------------------------------------------------------
Topology: 'MASTER_SLAVE'
--------------------------------------------------------------------------------------
# node host1
cookbook  [slave]   seqno:      5,384  - latency:   2.530 - ONLINE

# node host2
cookbook  [master]  seqno:      5,384  - latency:   2.446 - ONLINE

# node host3
cookbook  [slave]   seqno:      5,384  - latency:   2.595 - ONLINE

# node host4
cookbook  [slave]   seqno:      5,384  - latency:   2.537 - ONLINE

As you can see from the listing above, The script displays the steps for the switch, using trepctl as a centralized tool.

Under load

After the simple installation in Part 1, we saw that we can test the flow of replication using 'cookbook/test_cluster'. That's a very simple set of operations that merely checks if replication is working. If we want to perform more serious tests, we should apply a demanding load to the replication system.

If you don't have applications that can exercise the servers to your liking, you should be pleased to know that Tungsten Replicator ships with a built-in application for data loading and benchmarking. Inside the expanded tarball, there is a directory named bristlecone, containing the software for such testing tools. There is a detailed set of instructions under './bristlecone/doc'. For the impatient, there is a cookbook recipe that starts a reasonable load with a single command:

$ cookbook/load_data start
# Determining current roles
# Evaluator started with pid 28370
# Evaluator details are available at /home/tungsten/installs/cookbook/tungsten/load/host1/evaluator.job
# Evaluator output can be monitored at /home/tungsten/installs/cookbook/tungsten/load/host1/evaluator.log

$ cat /home/tungsten/installs/cookbook/tungsten/load/host1/evaluator.job
Task started at   : Sun Apr  7 18:20:00 2013
Task started from : /home/tungsten/tinstall/current
Executable        : /home/tungsten/installs/cookbook/tungsten/bristlecone/bin/evaluator.sh
Process ID        : 28370
Using             : /home/tungsten/installs/cookbook/tungsten/load/host1/evaluator.xml
Process id        : /home/tungsten/installs/cookbook/tungsten/load/host1/evaluator.pid
Log               : /home/tungsten/installs/cookbook/tungsten/load/host1/evaluator.log
Database          : host1
Table prefix      : tbl
Host              : host1
Port              : 3306
User              : tungsten
Test duration     : 3600

$  tail /home/tungsten/installs/cookbook/tungsten/load/host1/evaluator.log
18:22:18,672 INFO  1365351738672 10/10 5035.0 ops/sec 0 ms/op 28380 rows/select 41 updates 54 deletes 166 inserts
18:22:20,693 INFO  1365351740693 10/10 4890.0 ops/sec 0 ms/op 26746 rows/select 57 updates 37 deletes 144 inserts
18:22:22,697 INFO  1365351742697 10/10 4986.0 ops/sec 0 ms/op 28183 rows/select 59 updates 46 deletes 162 inserts
18:22:24,716 INFO  1365351744716 10/10 5208.0 ops/sec 0 ms/op 29067 rows/select 51 updates 51 deletes 171 inserts
18:22:26,736 INFO  1365351746736 10/10 4856.0 ops/sec 0 ms/op 27695 rows/select 46 updates 68 deletes 141 inserts
18:22:28,739 INFO  1365351748739 10/10 5022.0 ops/sec 0 ms/op 28269 rows/select 51 updates 58 deletes 145 inserts
18:22:30,758 INFO  1365351750758 10/10 4893.0 ops/sec 0 ms/op 28484 rows/select 47 updates 50 deletes 165 inserts
18:22:32,777 INFO  1365351752777 10/10 4501.0 ops/sec 0 ms/op 26481 rows/select 42 updates 52 deletes 130 inserts
18:22:34,781 INFO  1365351754781 10/10 5057.0 ops/sec 0 ms/op 30450 rows/select 58 updates 53 deletes 157 inserts
18:22:36,801 INFO  1365351756801 10/10 5087.0 ops/sec 0 ms/op 30845 rows/select 55 updates 56 deletes 156 inserts

What happens here?

The evaluator process is started using a file named 'evaluator.xml,' which is generated dynamically. The cookbook recipe detects which is the current master in the replication system and directs the operations there (in our case, it's 'host1'). The same task takes note of the process ID, which will be used to stop the evaluator when done, and the output is sent to a file, where you can look at it if needed.

Looking at evaluator.log, you can see that there are quite a lot of operations going on. Most of them are read queries, as the application was designed to solicit a database server as much as possible. Nonetheless, there are quite a lot of update operations, as a call to 'show_cluster' can confirm.

$ cookbook/show_cluster
--------------------------------------------------------------------------------------
Topology: 'MASTER_SLAVE'
--------------------------------------------------------------------------------------
# node host1
cookbook  [master]  seqno:     30,292  - latency:   0.566 - ONLINE

# node host2
cookbook  [slave]   seqno:     30,277  - latency:   0.531 - ONLINE

# node host3
cookbook  [slave]   seqno:     30,269  - latency:   0.511 - ONLINE

# node host4
cookbook  [slave]   seqno:     30,287  - latency:   0.550 - ONLINE

The load will continue for one hour (unless you defined a different duration). SHould you want to stop it before that period, you can run:

$ cookbook/load_data stop
# Determining current roles
# Stopping Evaluator at pid 28370

One important piece of information about this load application is that it looks for the masters in your cluster, and starts a load in every master. This is useful if you want to test a multi-master topology, as the ones we will see in another article.

If the default behavior of load_data is not what you expect, you can further customize the load by fine tuning the application launcher. First, you run 'load_data' with the print option:

$ cookbook/load_data print
# Determining current roles
$HOME/installs/cookbook/tungsten/bristlecone/bin/concurrent_evaluator.pl \
    --deletes=1 \
    --updates=1 \
    --inserts=3 \
    --test-duration=3600 \
    --host=host1 \
    --port=3306 \
    -u tungsten \
    -p secret  \
    --continuent-root=/home/tungsten/installs/cookbook \
    -d host1 \
    -s /home/tungsten/installs/cookbook/tungsten/load/host1  start

Then, you can copy and paste the resulting command, and eventually run the concurrent_evaluator script with your additions.

There are many options available. The manual is embedded in the application itself:

$ ./bristlecone/bin/concurrent_evaluator.pl --manual

An important option that we can use is --instances=N. This option will launch concurrently the evaluator N times, each time using a different schema. We will use this option to test parallel replication.

Backup

I am not going to stress here how important backups are. I assume (perhaps foolishly) that everyone reading this article knows why. Instead, I want to show how Tungsten Replicator supports backup and restore as integrated methods.

When you install Tungsten, you can add options to select a backup method and fine tune its behavior.

$ ./tools/tungsten-installer --help-master-slave -a |grep backup
--backup-directory            Permanent backup storage directory [$TUNGSTEN_HOME/backups]
                              This directory should be accessible by every replicator to ensure simple operation of backup and restore.
--backup-method               Database backup method (none|mysqldump|xtrabackup|script) [xtrabackup-full]
                              Tungsten integrates with a variety of backup mechanisms. We strongly recommend you configure one of these to help with provisioning servers. Please consult the Tungsten
                              Replicator Guide for more information on backup configuration.
--backup-dump-directory       Backup temporary dump directory [/tmp]
--backup-retention            Number of backups to retain [3]
--backup-script               What is the path to the backup script
--backup-command-prefix       Use sudo when running the backup script? [false]
--backup-online               Does the backup script support backing up a datasource while it is ONLINE [false]

First off, the default directory for backups is under your installation directory ($TUNGSTEN_HOME/backups). If you want to take backups through Tungsten, you must make sure that there is enough storage in that path to hold at least one backup. Tungsten will keep up to three backups in that directory, but you can define this action differently.

Second, the default backup method is 'mysqldump,' not because it is recommended, but because it is widely available. As you probably know, though, if your database is more than a few dozen GB, mysqldump is not an adequate method.

Tungsten Replicator provides support for xtrabackup. If xtrabackup is installed in your servers, you can define it as your default backup method. When you are installing a new cluster, you can do this:

$ export MORE_OPTIONS='-a --backupmethod=xtrabackup --backup-command-prefix=true'
$ cookbook/install_master_slave

If you have just installed and need to reconfigure, you can call 'configure_service' to accomplish the task:

$ cookbook/configure_service -U -a --backup-method=xtrabackup --backup-command-prefix=true cookbook

(Where 'cookbook' is the service name). VERY IMPORTANT: configure_service acts on a single host, and by default it acts on the current host, unless you say otherwise. For example:

$ cookbook/configure_service -U --host=host2 -a --backup-method=xtrabackup --backup-command-prefix=true cookbook

You will have to restart the replicator in node 'host2' for the changes to take effect.

$ ssh host2 "cd $TUNGSTEN_BASE/tungsten/ ; ./cookbook/replicator restart"

Using the backup is quite easy. You only need to call 'trepctl', indicate in which host you want to take a backup, and Tungsten will do the rest.

$ cookbook/trepctl -host host3 backup
Backup completed successfully; URI=storage://file-system/store-0000000001.properties

$ cookbook/trepctl -host host2 backup
Backup completed successfully; URI=storage://file-system/store-0000000001.properties

Apparently, we have two backups with the same contents, taken from two different nodes. However, since we have changed the backup method for host2, we will have a mysqldump small file for host3, and a rather larger xtrabackup file for host2. Again, the cookbook has a method that shows the backups that are available in all the nodes:

$ ./cookbook/backups
   backup-agent : (service: cookbook) mysqldump
     backup-dir : (service: cookbook) /home/tungsten/installs/cookbook/backups/cookbook
# [node: host1] 0 files found
# [node: host2] 3 files found
++ /home/tungsten/installs/cookbook/backups/cookbook
total 2.4G
-rw-r--r-- 1 tungsten tungsten   72 Apr  7 21:52 storage.index
-rw-r--r-- 1 tungsten tungsten 2.4G Apr  7 21:52 store-0000000001-full_xtrabackup_2013-04-07_21-50_59.tar
-rw-r--r-- 1 tungsten tungsten  323 Apr  7 21:52 store-0000000001.properties
drwxr-xr-x 2 tungsten tungsten 4.0K Apr  7 21:52 xtrabackup

# [node: host3] 3 files found
++ /home/tungsten/installs/cookbook/backups/cookbook
total 6.3M
-rw-r--r-- 1 tungsten tungsten   72 Apr  7 21:50 storage.index
-rw-r--r-- 1 tungsten tungsten 6.3M Apr  7 21:50 store-0000000001-mysqldump_2013-04-07_21-50_28.sql.gz
-rw-r--r-- 1 tungsten tungsten  315 Apr  7 21:50 store-0000000001.properties

# [node: host4] 0 files found

WARNING: This example was here only to show how to change the backup method. It is NOT recommended to have mixed methods for backups in different nodes. Unless you have a specific need, and understand the consequence of this choice, you should have the same backup method everywhere.

Restore

A backup is only good if you can use to restore your data. Using the same method shown to take a backup, you can restore your data. For this example, let's use mysqldump in all nodes (just because it's quicker), and show the operations for a backup and restore.

First, we take a backup in node 'host3', and then we will restore the data in 'host2'.

$ cookbook/trepctl -host host3 backup
Backup completed successfully; URI=storage://file-system/store-0000000001.properties

$ cookbook/backups
   backup-agent : (service: cookbook) mysqldump
     backup-dir : (service: cookbook) /home/tungsten/installs/cookbook/backups/cookbook
# [node: host1] 0 files found
# [node: host2] 0 files found
# [node: host3] 3 files found
++ /home/tungsten/installs/cookbook/backups/cookbook
total 6.2M
-rw-r--r-- 1 tungsten tungsten   72 Apr  7 22:05 storage.index
-rw-r--r-- 1 tungsten tungsten 6.1M Apr  7 22:05 store-0000000001-mysqldump_2013-04-07_22-05_43.sql.gz
-rw-r--r-- 1 tungsten tungsten  315 Apr  7 22:05 store-0000000001.properties

# [node: host4] 0 files found

Now, we have the backup files in host3, but we have an issue in host2, and we need to take a restore there. Assuming that the database server is unusable (this is usually the case when we must take a restore), we have the unpleasant situation where the backups are in one node, and we need to use in another. In a well organized environment, we would have a shared storage for the backup directory, and thus we could just move ahead and perform our restore. In this case, though, we have no such luxury. Then, we use yet another feature of the cookbook:

$ cookbook/copy_backup
syntax: copy_backup SERVICE SOURCE_NODE DESTINATION_NODE

$ cookbook/copy_backup cookbook host3 host2
# No message = success

$ cookbook/backups
   backup-agent : (service: cookbook) mysqldump
     backup-dir : (service: cookbook) /home/tungsten/installs/cookbook/backups/cookbook
# [node: host1] 0 files found
# [node: host2] 3 files found
++ /home/tungsten/installs/cookbook/backups/cookbook
total 6.2M
-rw-r--r-- 1 tungsten tungsten   72 Apr  7 22:05 storage.index
-rw-r--r-- 1 tungsten tungsten 6.1M Apr  7 22:05 store-0000000001-mysqldump_2013-04-07_22-05_43.sql.gz
-rw-r--r-- 1 tungsten tungsten  315 Apr  7 22:05 store-0000000001.properties

# [node: host3] 3 files found
++ /home/tungsten/installs/cookbook/backups/cookbook
total 6.2M
-rw-r--r-- 1 tungsten tungsten   72 Apr  7 22:05 storage.index
-rw-r--r-- 1 tungsten tungsten 6.1M Apr  7 22:05 store-0000000001-mysqldump_2013-04-07_22-05_43.sql.gz
-rw-r--r-- 1 tungsten tungsten  315 Apr  7 22:05 store-0000000001.properties

# [node: host4] 0 files found

The 'copy_backup' command has copied the files from one host to another, and now we are ready to perform a restore in host2.

$ cookbook/trepctl -host host2 restore
Operation failed: Restore operation failed: Operation irrelevant in current state

Hmm. Probably not the friendliest of error messages. What this scoundrel means is that it can't perform a restore when the replicator is online.

$ cookbook/trepctl -host host2 offline
$ cookbook/trepctl -host host2 restore
Restore completed successfully

$ cookbook/trepctl -host host2 services
Processing services command...
NAME              VALUE
----              -----
appliedLastSeqno: 17955
appliedLatency  : 0.407
role            : slave
serviceName     : cookbook
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

The restore operation was successful. We could have used xtrabackup just as well. THe only difference is that the operation takes way longer.

Parallel replication

Slave lagging is a common occurrence in MySQL replication. Most of the time, the reason for this problem is that while the master updates data using many threads concurrently, the slave applies the replication stream using a single thread. In Tungsten there is a built-in feature that applies changes in parallel, when the updates are happening in separate schemas. For database servers that are sharded by database or for the ones the serve multi-tenancy application, this is an ideal case. It is likely that the action happens in several schemas at once, and thus Tungsten can parallelize the changes successfully. Notice, however, that if you are running operations using a single schema, parallel replication won't give you any relief. Also, the operations must be really independent from each other. If a schema has foreign keys that reference to another schema, or if a transaction mixes data from two or more schemas, Tungsten will stop parallelizing and resume working in single thread until the end of the unclean operation, resulting in an overall decrease of performance, instead of increase.

To activate parallel replication, you need to enable two options:

--channels=N where you indicate how many parallel threads you want to establish. You should indicate as many channels as the number of schemas where you are operating. Some benchmarks will help you find the limits. Defining too many schemas will eventually exhaust the system resources. If the number of schemas is larger than the channels, Tungsten will use the channels in a round-robin fashion.
--svc-parallelization-type=disk: This option will activate a fast queue-creation algorithm that acts directly to the THL files. Contrary to common perception, where one would believe that in-memory queues are faster, this method is very efficient and less likely to exhaust system resources.

If you want to install all the servers with parallel replication, you can do this:

$ export MORE_OPTIONS='-a --channels=5 --svc-parallelization-type=disk'
$ cookbook/install_master_slave

If you need parallel replication only on one particular slave service, you can enable parallel replication there, using 'configure_service', same as we have seen before for the backup-method.

In this example, we're going to use the second method

$ cookbook/configure_service -U -a --host=host4 --channels=5 --svc-parallelization-type=disk cookbook
WARN  >> host4 >> THL schema tungsten_cookbook already exists at tungsten@host4:3306 (WITH PASSWORD)
NOTE >> host4 >> Deployment finished

$ cookbook/replicator restart
Stopping Tungsten Replicator Service...
Stopped Tungsten Replicator Service.
Starting Tungsten Replicator Service...

Now parallel replication is enabled. But how do we make sure that the service has indeed been enhanced?

The quickest method is to check the Tungsten service schema. Every replicator service creates a database schema named 'tungsten_$SERVICE_NAME', where it stores the current replication status. For example, in our default system, where the only replication service is called 'cookbook', we will find a schema named 'tungsten_cookbook'. The table that we want to inspect is one named 'trep_commit_seqno', where we store the global transaction ID, the schema where the transaction was applied, the data origin, and the time stamps at extraction and apply time. What is relevant in this table is that there will be one record for each channel that we have enabled. Thus, in host2 and host3 there will be only one line, while in host4 we should find 5 lines.

There is one useful recipe to get this result at once:

$ cookbook/query_all_nodes 'select count(*) from tungsten_cookbook.trep_commit_seqno'
+----------+
| count(*) |
+----------+
|        1 |
+----------+
+----------+
| count(*) |
+----------+
|        1 |
+----------+
+----------+
| count(*) |
+----------+
|        1 |
+----------+
+----------+
| count(*) |
+----------+
|        5 |
+----------+

Right! So we have 5 channels. Before inspecting what is going on in these channels, let's apply some load. You may recall that our load_data script can show you a command that we can customize for our purpose.

$ cookbook/load_data print
/home/tungsten/installs/cookbook/tungsten/bristlecone/bin/concurrent_evaluator.pl \
    --deletes=1 \
    --updates=1 \
    --inserts=3 \
    --test-duration=3600 \
    --host=host1 \
    --port=3306 \
    -u tungsten \
    -p secret  \
    --continuent-root=/home/tungsten/installs/cookbook \
    -d host1 \
    -s /home/tungsten/installs/cookbook/tungsten/load/host1 start

We just copy-and-paste this command, adding --instances=5 at the end, and we get 5 messages indicating that an evaluator was started. Let's see:

$ cookbook/query_node host4 'show schemas'
+--------------------+
| Database           |
+--------------------+
| information_schema |
| host11             |
| host12             |
| host13             |
| host14             |
| host15             |
| mysql              |
| test               |
| tungsten_cookbook  |
+--------------------+

Since we indicated that the database was to be named 'host1' and we have asked for 5 instances, the evaluator has created host11. host12, and so on.

Now that there is some action, we can have a look at our replication. Rather than querying the database directly, asking for the contents of trep_commit_seqno, we use another cookbook recipe:

$ cookbook/tungsten_service all
# node: host1 - service: cookbook
+--------+-----------+-----------------+----------+---------------------+---------------------+
| seqno  | source_id | applied_latency | shard_id | update_timestamp    | extract_timestamp   |
+--------+-----------+-----------------+----------+---------------------+---------------------+
| 324246 | host1     |               1 | host12   | 2013-04-07 23:02:16 | 2013-04-07 23:02:15 |
+--------+-----------+-----------------+----------+---------------------+---------------------+
# node: host2 - service: cookbook
+--------+-----------+-----------------+----------+---------------------+---------------------+
| seqno  | source_id | applied_latency | shard_id | update_timestamp    | extract_timestamp   |
+--------+-----------+-----------------+----------+---------------------+---------------------+
| 324383 | host1     |               0 | host13   | 2013-04-07 23:02:16 | 2013-04-07 23:02:16 |
+--------+-----------+-----------------+----------+---------------------+---------------------+
# node: host3 - service: cookbook
+--------+-----------+-----------------+----------+---------------------+---------------------+
| seqno  | source_id | applied_latency | shard_id | update_timestamp    | extract_timestamp   |
+--------+-----------+-----------------+----------+---------------------+---------------------+
| 324549 | host1     |               0 | host13   | 2013-04-07 23:02:16 | 2013-04-07 23:02:16 |
+--------+-----------+-----------------+----------+---------------------+---------------------+
# node: host4 - service: cookbook
+--------+-----------+-----------------+----------+---------------------+---------------------+
| seqno  | source_id | applied_latency | shard_id | update_timestamp    | extract_timestamp   |
+--------+-----------+-----------------+----------+---------------------+---------------------+
| 324740 | host1     |               0 | host11   | 2013-04-07 23:02:16 | 2013-04-07 23:02:16 |
| 324736 | host1     |               0 | host12   | 2013-04-07 23:02:16 | 2013-04-07 23:02:16 |
| 324739 | host1     |               0 | host13   | 2013-04-07 23:02:16 | 2013-04-07 23:02:16 |
| 324737 | host1     |               0 | host14   | 2013-04-07 23:02:16 | 2013-04-07 23:02:16 |
| 324735 | host1     |               0 | host15   | 2013-04-07 23:02:16 | 2013-04-07 23:02:16 |
+--------+-----------+-----------------+----------+---------------------+---------------------+

Here you see that host1 has only one channel: it is the master, and it must serialize according to the binary log. Slaves host2 and host3 have only one channel, because we have enabled parallel replication only in host4. And finally we see that in host4 there are 5 channels, each showing a different shard_id (= database schema), with its own transaction ID being applied. This shows that replication is working.

Tungsten Replicator has, however, several tools that help monitoring parallel replication:

$ cookbook/trepctl -host host4 status -name stores
Processing status command (stores)...
NAME                      VALUE
----                      -----
activeSeqno             : 475395
doChecksum              : false
flushIntervalMillis     : 0
fsyncOnFlush            : false
logConnectionTimeout    : 28800
logDir                  : /home/tungsten/installs/cookbook/thl/cookbook
logFileRetainMillis     : 604800000
logFileSize             : 100000000
maximumStoredSeqNo      : 475449
minimumStoredSeqNo      : 0
name                    : thl
readOnly                : false
storeClass              : com.continuent.tungsten.replicator.thl.THL
timeoutMillis           : 2147483647
NAME                      VALUE
----                      -----
criticalPartition       : -1
discardCount            : 0
estimatedOfflineInterval: 0.0
eventCount              : 457459
headSeqno               : 475415
intervalGuard           : AtomicIntervalGuard (array is empty)
maxDelayInterval        : 60
maxOfflineInterval      : 5
maxSize                 : 10
name                    : parallel-queue
queues                  : 5
serializationCount      : 0
serialized              : false
stopRequested           : false
store.0                 : THLParallelReadTask task_id=0 thread_name=store-thl-0 hi_seqno=475415 lo_seqno=17957 read=457459 accepted=93357 discarded=364102 events=0
store.1                 : THLParallelReadTask task_id=1 thread_name=store-thl-1 hi_seqno=475415 lo_seqno=17957 read=457459 accepted=92567 discarded=364892 events=0
store.2                 : THLParallelReadTask task_id=2 thread_name=store-thl-2 hi_seqno=475415 lo_seqno=17957 read=457459 accepted=91197 discarded=366262 events=0
store.3                 : THLParallelReadTask task_id=3 thread_name=store-thl-3 hi_seqno=475415 lo_seqno=17957 read=457459 accepted=90492 discarded=366967 events=0
store.4                 : THLParallelReadTask task_id=4 thread_name=store-thl-4 hi_seqno=475415 lo_seqno=17957 read=457459 accepted=89846 discarded=367613 events=0
storeClass              : com.continuent.tungsten.replicator.thl.THLParallelQueue
syncInterval            : 10000
Finished status command (stores)...

This command shows the status of parallel replication in each channels. Notable information in this screen:

eventCount is the number of transaction being processed
serializationCount:0 means that all events have been parallelized, and there was no need to serialize any.
'read' ... 'accepted' ... 'discarded' are the operation in the disk queue. Each channel parses all the events, and queues only the ones that belong in its shard.

$ cookbook/trepctl -host host3 status -name shards
Processing status command (shards)...
 ...
NAME                VALUE
----                -----
appliedLastEventId: mysql-bin.000006:0000000169567337;0
appliedLastSeqno  : 660707
appliedLatency    : 1.314
eventCount        : 130325
shardId           : host11
stage             : q-to-dbms
NAME                VALUE
----                -----
appliedLastEventId: mysql-bin.000006:0000000169566006;0
appliedLastSeqno  : 660702
appliedLatency    : 1.312
eventCount        : 129747
shardId           : host12
stage             : q-to-dbms
 ...

This command (only a portion reported here) displays the status of each shard, showing for each one which event, transaction ID and event count were recorded.

There should be much more to mention about the monitoring tools, but for now I want just to mention a last important point. When the replicator goes offline, parallel replication stops, and the replication operations are consolidated into a single thread. This makes sure that replication can later resume using a single thread, or it can be safely handed over to native MySQL replication. This behavior also makes sure that a slave can be safely promoted to master. A switch operation requires that the slave service be offline before being reconfigured to become a master. When the replicator goes offline, the N channels become 1.

$ ./cookbook/trepctl offline
$ cookbook/tungsten_service all
# node: host1 - service: cookbook
+--------+-----------+-----------------+----------+---------------------+---------------------+
| seqno  | source_id | applied_latency | shard_id | update_timestamp    | extract_timestamp   |
+--------+-----------+-----------------+----------+---------------------+---------------------+
| 769652 | host1     |               0 | host12   | 2013-04-07 23:18:07 | 2013-04-07 23:18:07 |
+--------+-----------+-----------------+----------+---------------------+---------------------+
# node: host2 - service: cookbook
+--------+-----------+-----------------+----------+---------------------+---------------------+
| seqno  | source_id | applied_latency | shard_id | update_timestamp    | extract_timestamp   |
+--------+-----------+-----------------+----------+---------------------+---------------------+
| 769699 | host1     |               0 | host13   | 2013-04-07 23:18:08 | 2013-04-07 23:18:08 |
+--------+-----------+-----------------+----------+---------------------+---------------------+
# node: host3 - service: cookbook
+--------+-----------+-----------------+----------+---------------------+---------------------+
| seqno  | source_id | applied_latency | shard_id | update_timestamp    | extract_timestamp   |
+--------+-----------+-----------------+----------+---------------------+---------------------+
| 769866 | host1     |               0 | host15   | 2013-04-07 23:18:08 | 2013-04-07 23:18:08 |
+--------+-----------+-----------------+----------+---------------------+---------------------+
# node: host4 - service: cookbook
+--------+-----------+-----------------+----------+---------------------+---------------------+
| seqno  | source_id | applied_latency | shard_id | update_timestamp    | extract_timestamp   |
+--------+-----------+-----------------+----------+---------------------+---------------------+
| 767064 | host1     |               0 | host15   | 2013-04-07 23:18:01 | 2013-04-07 23:18:01 |
+--------+-----------+-----------------+----------+---------------------+---------------------+

If we put it back online, we see again the channels expanding.

Further info:

Project home: http://tungsten-replicator.org
Discussion group: (a Google Group discussion on Tungsten Replicator)

Installing and Administering Tungsten Replicator - Part 1 - basics

Intro

Tungsten Replicator is an open source tool that does high performance replication across database servers. It was designed to replace MySQL replication, although it also supports replication from and to Oracle and other systems. In this article, we will only cover MySQL replication, both simple and multi-master.

Preparing for installation

To follow the material in this article, you will need a recent build of Tungsten Replicator. You can get the latest ones from http://bit.ly/tr20_builds. In this article, we are using build 2.0.8-167.

Before starting any installation, you should make sure that you have satisfied all the prerequisites. Don't underestimate the list. Any missing items will likely result in installation errors.

If you are using Amazon EC2 servers, this page provides a script that makes the prerequisites an almost fully automated procedure.

To install any of the topologies supported by Tungsten, you need first to extract the software, define your nodes, and eventually change your default options.

Download the software from http://bit.ly/tr20_builds
Expand the tarball (tar -xzf tungsten-replicator-2.0.8-136.tar.gz)
Change directory to the extracted path (cd tungsten-replicator-2.0.8-167)
Define the VERBOSE user variable (it will show much details in the operations
Edit the configuration files COMMON_NODES.sh and USER_VALUES.sh (The fields in RED are the ones that you should probably change.)

$ tar -xzf tungsten-replicator-2.0.8-167.tar.gz
$ cd tungsten-replicator-2.0.8-167
$ export VERBOSE=1
$ export PATH=$PWD/cookbook:$PATH

$ cat cookbook/COMMON_NODES.sh
#!/bin/bash
# (C) Copyright 2012,2013 Continuent, Inc - Released under the New BSD License
# Version 1.0.5 - 2013-04-03

export NODE1=host1
export NODE2=host2
export NODE3=host3
export NODE4=host4

$ cat cookbook/USER_VALUES.sh
#!/bin/bash
# (C) Copyright 2012,2013 Continuent, Inc - Released under the New BSD License
# Version 1.0.5 - 2013-04-03

# User defined values for the cluster to be installed.

# Where to install Tungsten Replicator
export TUNGSTEN_BASE=$HOME/installs/cookbook

# Directory containing the database binary logs
export BINLOG_DIRECTORY=/var/lib/mysql

# Path to the options file
export MY_CNF=/etc/my.cnf

# Database credentials
export DATABASE_USER=tungsten
export DATABASE_PASSWORD=secret
export DATABASE_PORT=3306

# Name of the service to install
export TUNGSTEN_SERVICE=cookbook

Pay attention to the paths. TUNGSTEN_BASE is where the binaries will be installed. You must make sure that:

The path is writable by the current user, in all nodes;
There is nothing in that path that may clash with the software to be installed. It should be a dedicated directory. Do NOT use your $HOME for this purpose. Use a subdirectory or a path under /usr/local or /opt/
The path must have enough storage to hold Tungsten Transaction History Logs (THL). They will occupy roughly twice as much as your binary logs.

Validating your nodes

You may think that you have followed the instructions for the prerequisites, but sometimes humans make mistakes. To make sure that your cluster can run Tungsten, you can run the script validate_cluster. This is an operation that performs all the installation checks in all nodes, without actually installing anything.

$ cookbook/validate_cluster
# Performing validation check ...
## 1 (host: host4)
./tools/tungsten-installer \
    --master-slave \
    --master-host=host1 \
    --datasource-user=tungsten \
    --datasource-password=secret \
    --datasource-port=3306 \
    --service-name=cookbook \
    --home-directory=/home/tungsten/installs/cookbook \
    --cluster-hosts=host1,host2,host3,host4 \
    --datasource-mysql-conf=/etc/my.cnf \
    --datasource-log-directory=/var/lib/mysql \
    --rmi-port=10000 \
    --thl-port=2112 \
    --validate-only -a \
    --info \
    --start
INFO  >> Start: Check that the master-host is part of the config
INFO  >> Finish: Check that the master-host is part of the config
#####################################################################
# Tungsten Community Configuration Procedure
#####################################################################
NOTE:  To terminate configuration press ^C followed by ENTER
...
...
...
( LOTS OF LINES FOLLOW)

If there is any error in your prerequisites, you will get an error, or possibly more than one. If the messages provided by this command are not enough to understand what it is going on, you can ask for yet more detail, using:

$ VERBOSE=2 ./cookbook/validate_cluster

DRY-RUN installation

Should you need to install in a set of nodes where you can't allow ssh connection across nodes, you may use the DRYRUN variable.

$ export DRYRUN=1

When this variable is set, the installation commands will not install, but only show you all the commands that you should run, with the right sequence.

For example, if you want to validate the cluster without SSH communication between nodes, a DRYRUN command will tell you the list of instructions to run and in which hosts to run them

$ DRYRUN=1 ./cookbook/validate_cluster
# Performing validation check ...
...

Using the instructions so received, you can copy the software to each node, and run the appropriate command in each one.

The same goes for every installation command. Should you need to install a star topology node by node with custom options, just run:

$ DRYRUN=1 ./cookbook/install_star

When you don't need the DRYRUN command anymore, remove the variable:

$ unset DRYRUN

Installing a master-slave topology

After the validation, you can launch your installation. If the topology is master/slave, the defaults are stored in cookbook/NODES_MASTER_SLAVE.sh

$ cat cookbook/NODES_MASTER_SLAVE.sh
#!/bin/bash
# (C) Copyright 2012,2013 Continuent, Inc - Released under the New BSD License
# Version 1.0.5 - 2013-04-03

CURDIR=`dirname $0`
if [ -f $CURDIR/COMMON_NODES.sh ]
then
    . $CURDIR/COMMON_NODES.sh
else
    export NODE1=
    export NODE2=
    export NODE3=
    export NODE4=
fi

export ALL_NODES=($NODE1 $NODE2 $NODE3 $NODE4)
# indicate which servers will be masters, and which ones will have a slave service
# in case of all-masters topologies, these two arrays will be the same as $ALL_NODES
# These values are used for automated testing

#for master/slave replication
export MASTERS=($NODE1)
export SLAVES=($NODE2 $NODE3 $NODE4)

The only variables that should concern you here are MASTERS and SLAVES. They refer to the nodes defined in COMMON_NODES.sh. If your master is NODE1, there is no need to change anything. If your master is, say, NODE2, then change the variables as:

export MASTERS=($NODE2)
export SLAVES=($NODE1 $NODE3 $NODE4)

Make sure that you have adjusted both master and slave definitions.

$ cookbook/install_master_slave
## 1 (host: host4)
./tools/tungsten-installer \
    --master-slave \
    --master-host=host1 \
    --datasource-user=tungsten \
    --datasource-password=secret \
    --datasource-port=3306 \
    --service-name=cookbook \
    --home-directory=/home/tungsten/installs/cookbook \
    --cluster-hosts=host1,host2,host3,host4 \
    --datasource-mysql-conf=/etc/my.cnf \
    --datasource-log-directory=/var/lib/mysql \
    --rmi-port=10000 \
    --thl-port=2112 \
    --start
... # A few minutes later ...
--------------------------------------------------------------------------------------
Topology: 'MASTER_SLAVE'
--------------------------------------------------------------------------------------
# node host1
cookbook  [master]  seqno:          1  - latency:   0.631 - ONLINE

# node host2
cookbook  [slave]   seqno:          1  - latency:   0.607 - ONLINE

# node host3
cookbook  [slave]   seqno:          1  - latency:   0.746 - ONLINE

# node host4
cookbook  [slave]   seqno:          1  - latency:   0.640 - ONLINE

Deployment completed
Topology      :'master_slave'
Tungsten path : /home/tungsten/installs/cookbook
Nodes         : (host1 host2 host3 host4)

After installing all the nodes, the cookbook script displays the cluster status. In this list, 'cookbook' is the name of the replication service, as defined in USER_VALUES.sh. You can change it before installing. Any name will do. Next to it you see the role (master or slave), then the 'seqno', which is the global transaction ID of your database events. Finally, the 'latency' is the difference, in seconds, between the time your transaction was recorded in the master binary logs and the time it was applied to the slave.

You can ask for such a status at any time, by calling:

$ cookbook/show_cluster

Simple replication administration

A cluster status doesn't tell you if replication is working. You may check if this is true by running:

$ cookbook/test_cluster
# --------------------------------------------------------------------------------------
# Testing cluster with installed topology 'MASTER_SLAVE'
# --------------------------------------------------------------------------------------
ok - Master host1 has at least 1 master services
# slave: host2
ok - Tables from master #1
ok - Views from master #1
ok - Records from master #1
ok - Slave host2 has at least 1 services
# slave: host3
ok - Tables from master #1
ok - Views from master #1
ok - Records from master #1
ok - Slave host3 has at least 1 services
# slave: host4
ok - Tables from master #1
ok - Views from master #1
ok - Records from master #1
ok - Slave host4 has at least 1 services
1..13

This command creates a table and a view in each master in your topology (in this case, a master/slave topology has only one master), insert a record using the view, and then check that each slave has replicated what was inserted. The output changes quite a lot when using a multi-master topology.

Astute readers will recognize that the output produced here complies with the Test Anything Protocol (TAP). If you have the 'prove' tool installed in your server, you may try it:

$ prove cookbook/test_cluster
cookbook/test_cluster...ok
All tests successful.
Files=1, Tests=13,  4 wallclock secs ( 3.17 cusr +  0.26 csys =  3.43 CPU)

Replication tools

The cluster status shown above (cookbook/show_cluster) uses the output of the Tungsten built-in tool trepctl to display a simplified status.

The tool is available inside the installation directory. If you have used the defaults, it is $HOME/installs/cookbook. ANd the tools are in $HOME/installs/cookbook/tungsten/tungsten-replicator/bin/.

This is not easy to remember, and even if you can remember it correctly, it requires a lot of typing. The cookbook provides an easy shortcut: cookbook/trepctl. For example:

$ cookbook/trepctl services
Processing services command...
NAME              VALUE
----              -----
appliedLastSeqno: 17
appliedLatency  : 0.773
role            : slave
serviceName     : cookbook
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

Or, if you want the simplified output:

$ cookbook/trepctl services | cookbook/simple_services
cookbook  [slave]   seqno:         17  - latency:   0.773 - ONLINE

To administer the system properly, you need to know the tools, some paths to the logs and the configuration files, which are somehow elusive. Again, the cookbook to the rescue:

$ cookbook/paths
     replicator : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/replicator
        trepctl : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/trepctl
            thl : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/thl
            log : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/log/trepsvc.log
           conf : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/conf/
        thl-dir : (service: cookbook) /home/tungsten/installs/cookbook/thl/cookbook
     backup-dir : (service: cookbook) /home/tungsten/installs/cookbook/backups/cookbook
   backup-agent : (service: cookbook) mysqldump

This command tells you the path to the three main tools:

trepctl Tungsten Replicator Control
thl or the Transaction History Log manager
replicator, which is the launcher for the replicator daemon.

You also get the path to the most common places you may need to access during your administrative tasks.

Similarly, there are shortcuts to perform common tasks:

cookbook/replicator: Shortcut to the 'replicator' command
cookbook/trepctl: Shortcut to the 'trepctl' command
cookbook/thl: Shortcut to the 'thl' command
cookbook/conf: Shows the configuration files using 'less'
cookbook/show_conf: Same as 'conf'
cookbook/edit_conf: Edits the configuration files using 'vim'
cookbook/vimconf: Same as 'vimconf.sh'
cookbook/emacsconf: Edits the configuration files using 'emacs'
cookbook/log: Shows the replicator log using 'less'
cookbook/show_log: Same as 'log'
cookbook/vilog: Edits the replicator log using 'vi'
cookbook/vimlog: Edits the replicator log using 'vim'
cookbook/emacslog: Edits the replicator log using 'emacs'
cookbook/heartbeat: Performs a heartbeat in each master
cookbook/paths: Shows the path to all important tools and services
cookbook/services: Performs 'trepctl services'
cookbook/backups: Shows which backups were taken in all nodes

You can get all the above commands, and all the others included in the cookbook, by calling:

$ cookbook/help

Uninstalling Tungsen

The cookbook makes it easy to install a replication cluster, and makes it easy to remove it as well.

If you look at the end of cookbook/USER_VALUES.sh, you will see these variables:

$ tail -n 12 cookbook/USER_VALUES.sh
##############################################################################
# Variables used when removing the cluster
# Each variable defines an action during the cleanup
##############################################################################
[ -z "$STOP_REPLICATORS" ]            && export STOP_REPLICATORS=1
[ -z "$REMOVE_TUNGSTEN_BASE" ]        && export REMOVE_TUNGSTEN_BASE=1
[ -z "$REMOVE_SERVICE_SCHEMA" ]       && export REMOVE_SERVICE_SCHEMA=1
[ -z "$REMOVE_TEST_SCHEMAS" ]         && export REMOVE_TEST_SCHEMAS=1
[ -z "$REMOVE_DATABASE_CONTENTS" ]    && export REMOVE_DATABASE_CONTENTS=0
[ -z "$CLEAN_NODE_DATABASE_SERVER" ]  && export CLEAN_NODE_DATABASE_SERVER=1
##############################################################################

The names are self-explanatory. These variables are used when you call the clear_cluster command. Then, the meaning becomes even more clear:

$ cookbook/clear_cluster
--------------------------------------------------------------------------------------
Clearing up cluster with installed topology 'MASTER_SLAVE'
--------------------------------------------------------------------------------------
!!! WARNING !!!
--------------------------------------------------------------------------------------
'clear-cluster' is a potentially damaging operation.
This command will do all the following:
* Stop the replication software in all servers. [$STOP_REPLICATORS]
* REMOVE ALL THE CONTENTS from /home/tungsten/installs/cookbook/.[$REMOVE_TUNGSTEN_BASE]
* REMOVE the tungsten_<service_name> schemas in all nodes (host1 host2 host3 host4) [$REMOVE_SERVICE_SCHEMA]
* REMOVE the schemas created for testing (test, evaluator) in all nodes (host1 host2 host3 host4)  [$REMOVE_TEST_SCHEMAS]
* Create the test server anew;                [$CLEAN_NODE_DATABASE_SERVER]
* Unset the read_only variable;               [$CLEAN_NODE_DATABASE_SERVER]
* Set the binlog format to MIXED;             [$CLEAN_NODE_DATABASE_SERVER]
* Reset the master (removes all binary logs); [$CLEAN_NODE_DATABASE_SERVER]
If this is what you want, either set the variable I_WANT_TO_UNINSTALL
or answer 'y' to the question below
You may also set the variables in brackets to fine tune the execution.
Alternatively, have a look at cookbook/clear_cluster and customize it to your needs.
--------------------------------------------------------------------------------------
Do you wish to uninstall this cluster? [y/n]

As you can see, for each action there is a corresponding variable. By default, all variables are active, except 'REMOVE_DATABASE_CONTENTS'. Setting or unsetting these variables will determine how much of your installation you want to undo.

Getting replication status

Once you have replication up and running, you need to know what's going on at a glance. We have seen in the previous sections that we can call trepctl services to get an overview of the replication process. Using the same tool, we can also get more detailed information

$ cookbook/trepctl status | nl
 1 Processing status command...
 2 NAME                     VALUE
 3 ----                     -----
 4 appliedLastEventId     : mysql-bin.000006:0000000000003163;0
 5 appliedLastSeqno       : 17
 6 appliedLatency         : 0.773
 7 channels               : 1
 8 clusterName            : default
 9 currentEventId         : NONE
10 currentTimeMillis      : 1365193975129
11 dataServerHost         : host4
12 extensions             :
13 latestEpochNumber      : 0
14 masterConnectUri       : thl://host1:2112/
15 masterListenUri        : thl://host4:2112/
16 maximumStoredSeqNo     : 17
17 minimumStoredSeqNo     : 0
18 offlineRequests        : NONE
19 pendingError           : NONE
20 pendingErrorCode       : NONE
21 pendingErrorEventId    : NONE
22 pendingErrorSeqno      : -1
23 pendingExceptionMessage: NONE
24 pipelineSource         : thl://host1:2112/
25 relativeLatency        : 22729.129
26 resourcePrecedence     : 99
27 rmiPort                : 10000
28 role                   : slave
29 seqnoType              : java.lang.Long
30 serviceName            : cookbook
31 serviceType            : local
32 simpleServiceName      : cookbook
33 siteName               : default
34 sourceId               : host4
35 state                  : ONLINE
36 timeInStateSeconds     : 23640.125
37 uptimeSeconds          : 23640.723
38 version                : Tungsten Replicator 2.0.8 build 136
39 Finished status command...

With the line number as a reference, we can describe quite a bit of useful information:

appliedLastEventId: (4) This is the event as found in the source database master. Since we are replicating from a MySQL server (don't forget that Tungsten can replicate from and to several heterogeneous servers) this ID is made of the binary log file name (mysql-bin.000006) and the binary log position (0000000000003163). Most of the time, you don't really need this information, as everything in Tungsten uses the Global Transaction ID (see next item)
appliedLastSeqno: (5) This is the Global Transaction Identifier for the current transaction.
appliedLatency: (6) This is the time difference, in seconds, between the moment when the transaction was written to the binary log in the master and the moment when it was applied in the slave. Notice that, if the server system times are not synchronized, you may have greater differences than expected. Also, if you keep a slave offline and re-connect it later, this value will increase accordingly.
channels (7) is the number of threads used for replication. By default it is 1. When using parallel replication, it increases.
dataServerHost: (11) The server for which we are showing the status.
masterConnectUri (14) is the address (hostname or IP + port ) of the current master for this service.
masterListenUri (15) is the address that will be used by the current server if it becomes a master.
pendingErrorSeqno: (22) When any of the error* lines (19 to 21) are used, this line shows the seqno (Global Transaction ID) of the event that is causing trouble. This piece of information is vital to find what is holding the system. (We will see an example later in this article)
role: (28) What is the role of this service. It could be 'master' or 'slave'. More roles are possible if the replicator is embedded in a more complex system.
serviceName: (30) The identification of the replication service. Not much important when using a master/slave topology, but vital when deploying multi-master services.
state: (35) It's what the replicator is doing. If "ONLINE," all is well. "OFFLINE:NORMAL" means that the service was stopped manually, while "OFFLINE:ERROR" means that something is wrong. If you see "GOING-ONLINE:SYNCHRONIZING," it means that either there is a connection issue between master and slave, or the slave is showing this state if the master is offline.

This command is the first step whenever you are troubleshooting a problem. If something goes wrong, chances are that 'cookbook/trepctl status' will tell you what it is going on. Notice, though, that if you are using a multi-master topology, then you will need to specify a service:

$ cookbook/trepctl -service somename status

It's quite important to understand that trepctl can give you the status of any node in the replication cluster. You don't need to execute the command in another node. All you need to do is indicate to trepctl for which host it should display the status.

$ cookbook/trepctl -host host1 -service somename status

'trepctl' has quite a lot of options, as you may discover if you run 'trepctl help'. We will see some of them in this series of articles.

Logs

The second step of troubleshooting, when 'trepctl status' was not enough to nail the problem, is looking at the logs.

Here, the problem you will face is "where the heck do I find the logs?"

As we have seen above in this article, the cookbook can show you the paths:

$ cookbook/paths
     replicator : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/replicator
        trepctl : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/trepctl
            thl : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/bin/thl
            log : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/log/trepsvc.log
           conf : /home/tungsten/installs/cookbook/tungsten/tungsten-replicator/conf/
        thl-dir : (service: cookbook) /home/tungsten/installs/cookbook/thl/cookbook
     backup-dir : (service: cookbook) /home/tungsten/installs/cookbook/backups/cookbook
   backup-agent : (service: cookbook) mysqldump

However, there is a simpler way. You can use one of the shortcuts to access the logs. For example, cookbook/log will show the log using 'less,' the well known file viewer. Should you want to use another tool for this task, there is a wide choice:

cookbook/show_log: Same as 'log'
cookbook/vilog: Edits the replicator log using 'vi'
cookbook/vimlog: Edits the replicator log using 'vim'
cookbook/emacslog: Edits the replicator log using 'emacs'

Inside the log, when you are troubleshooting, you should first try to find the same message displayed by 'trepctl status.' Around that point, you will find one or more Java stack traces, which contain information useful for the developers (file names and line numbers) and information useful for you (error messages as reported by the database server or the operating system or third party tools, which may help identifying the problem).

Reading events

Most often, when a problem has been identified, you need to know which is the event that is causing the problem. Usually, a look at the SQL, combined with the error message, may give you enough information to fix the problem.

The replication events (or transaction) are stored in several Transaction History Log (THL) files. These files contain the events, as taken from the binary logs, plus some metadata. Unlike the binary logs, though, the THL file names are totally unimportant. Since transactions are identified by number, you don't need to know their location.

To display a THL event, you use a tool named, most aptly, 'thl.' For example, after we run this query:

mysql --host=host1 test -e "insert into v1 values (2,'inserted by node #1')"

We can check the status with

$ cookbook/trepctl services
Processing services command...
NAME              VALUE
----              -----
appliedLastSeqno: 24
appliedLatency  : 0.563
role            : slave
serviceName     : cookbook
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

And then retrieve the event using the thl.

$ cookbook/thl list -seqno 24
SEQ# = 24 / FRAG# = 0 (last frag)
- TIME = 2013-04-05 23:32:18.0
- EPOCH# = 18
- EVENTID = mysql-bin.000006:0000000000004417;0
- SOURCEID = host1
- METADATA = [mysql_server_id=10;dbms_type=mysql;service=cookbook;shard=test]
- TYPE = com.continuent.tungsten.replicator.event.ReplDBMSEvent
- OPTIONS = [##charset = ISO8859_1, autocommit = 1, sql_auto_is_null = 1, foreign_key_checks = 1, unique_checks = 1, sql_mode = '', character_set_client = 8, collation_connection = 8, collation_server = 8]
- SCHEMA = test
- SQL(0) = insert into v1 values (2,'inserted by node #1') /* ___SERVICE___ = [cookbook] */

There is much metadata in this event, most of which is easily recognizable by any seasoned DBA. Some things that may be worth pointing out are:

SEQ# The sequence number, or seqno, or Global Transaction ID,
EVENTID: We have seen this when we described 'trepctl status';
SOURCEID: the server where the event was generated;
service: The service where the event was generated. This also tells us that the master role for this service is in host1.
shard: it is how Tungsten defines shards for parallel replication and conflict resolution. By default, a shard matches a database schema, although it can be defined otherwise.
SQL : this is the statement being executed. When the transaction contains more than one statement, then you will see SQL(1), SQL(2), and so on. If the event was row-based, then you will see a list of column and their contents instead of a SQL statement.
___SERVICE___ = [cookbook] This comment is added by the replicator to make it recognizable even after it goes to the binary log and gets replicated to a further level. This is not the only method used to mark events. The service identification can go in other places, such as the "comment" field of a "CREATE TABLE" statement.

Skipping transactions

One of the most common replication problems is a duplicate key violation, which in turn often occurs when someone erroneously writes to a slave instead of a master. When such error happens, you may find something like this:

$ cookbook/trepctl status
Processing status command...
NAME                     VALUE
----                     -----
appliedLastEventId     : NONE
appliedLastSeqno       : -1
appliedLatency         : -1.0
channels               : -1
clusterName            : default
currentEventId         : NONE
currentTimeMillis      : 1365199283287
dataServerHost         : host4
extensions             :
latestEpochNumber      : -1
masterConnectUri       : thl://host1:2112/
masterListenUri        : thl://host4:2112/
maximumStoredSeqNo     : -1
minimumStoredSeqNo     : -1
offlineRequests        : NONE
pendingError           : Event application failed: seqno=25 fragno=0 message=java.sql.SQLException:
Statement failed on slave but succeeded on master
pendingErrorCode       : NONE
pendingErrorEventId    : mysql-bin.000006:0000000000004622;0
pendingErrorSeqno      : 25
pendingExceptionMessage: java.sql.SQLException: Statement failed on slave but succeeded on master
                         insert into v1 values (3,'inserted by node #1') /* ___SERVICE___ = [cookbook] */
pipelineSource         : UNKNOWN
relativeLatency        : -1.0
resourcePrecedence     : 99
rmiPort                : 10000
role                   : slave
seqnoType              : java.lang.Long
serviceName            : cookbook
serviceType            : unknown
simpleServiceName      : cookbook
siteName               : default
sourceId               : host4
state                  : OFFLINE:ERROR
timeInStateSeconds     : 8.749
uptimeSeconds          : 28948.881
version                : Tungsten Replicator 2.0.8 build 136

Looking at the logs, we may see something like this:

INFO   | jvm 1    | 2013/04/06 00:01:14 | 2013-04-06 00:01:14,529 [cookbook - q-to-dbms-0] ERROR pipeline.SingleThreadStageTask
Event application failed: seqno=25 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master
INFO   | jvm 1    | 2013/04/06 00:01:14 | com.continuent.tungsten.replicator.applier.ApplierException: java.sql.SQLException: Statement failed on slave but succeeded on master
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:183)
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at com.continuent.tungsten.replicator.applier.JdbcApplier.apply(JdbcApplier.java:1321)
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at com.continuent.tungsten.replicator.applier.ApplierWrapper.apply(ApplierWrapper.java:101)
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.apply(SingleThreadStageTask.java:639)
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.runTask(SingleThreadStageTask.java:468)
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.run(SingleThreadStageTask.java:167)
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at java.lang.Thread.run(Unknown Source)
INFO   | jvm 1    | 2013/04/06 00:01:14 | Caused by: java.sql.SQLException: Statement failed on slave but succeeded on master
INFO   | jvm 1    | 2013/04/06 00:01:14 |       at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:140)
INFO   | jvm 1    | 2013/04/06 00:01:14 |       ... 6 more
INFO   | jvm 1    | 2013/04/06 00:01:14 | Caused by: java.sql.SQLIntegrityConstraintViolationException: Duplicate entry '3' for key 'PRIMARY'

After inspecting the tables in all nodes, we find that the host4 already contains a record with Primary Key=3, and that it has the same contents of the record coming from host1. In this case, the easiest way of fixing the error is by telling the replicator to skip this event.

$ cookbook/trepctl online -skip-seqno 25

After this, the replicator goes online, and, provided that there are no other errors after the first one, will continue replicating.

Taking over existing Replication

In the first sections of this article, we saw how to install Tungsten replicator as the primary source of replication. We assumed that the servers had the same contents, and there was no replication already going on. Here we assume, instead, that there was replication already, and we show the steps to reproduce the process.

To simulate the initial status, we're going to clear the cluster installed before, install native MySQL replication instead, and take over from there.

There is a recipe to install standard replication, just for this purpose.

$ cookbook/install_standard_mysql_replication
Starting slave on host2 Master File = mysql-bin.000005, Master Position = 106
Starting slave on host3 Master File = mysql-bin.000005, Master Position = 106
Starting slave on host4 Master File = mysql-bin.000005, Master Position = 106
# master  host1
mysql-bin.000005    554
#slave host2
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 554
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 554
replication test: ok

#slave host3
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 554
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 554
replication test: ok

#slave host4
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 554
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 554
replication test: ok

The installation also provides a simple test that checks if replication is running by creating a table named 't1' and retrieving it in the slaves. As you can see, after the test, the slaves are at position 554 of binary log # 000005. If we create another table, we check that it is replicated and take nota again of the binlog position.

$ mysql -h host1 -e 'create table test.test_standard(i int)'

$ for host in host1 host2 host3 host4; do mysql -h $host -e 'show tables from test' ; done
+----------------+
| Tables_in_test |
+----------------+
| t1             |
| test_standard  |
+----------------+
+----------------+
| Tables_in_test |
+----------------+
| t1             |
| test_standard  |
+----------------+
+----------------+
| Tables_in_test |
+----------------+
| t1             |
| test_standard  |
+----------------+
+----------------+
| Tables_in_test |
+----------------+
| t1             |
| test_standard  |
+----------------+
$ for host in  host2 host3 host4; do mysql -h $host -e 'show slave status\G' | grep 'Master_Log_File\|Read_Master_Log_Pos\|Running' ; done
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 651
        Relay_Master_Log_File: mysql-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 651
        Relay_Master_Log_File: mysql-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Master_Log_File: mysql-bin.000005
          Read_Master_Log_Pos: 651
        Relay_Master_Log_File: mysql-bin.000005
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

So, replication is running. We can try our take-over script, which the cookbook provides:

$ cookbook/take-over
./tools/tungsten-installer
    --master-slave
    --master-host=host1
    --datasource-user=tungsten
    --datasource-password=secret
    --datasource-port=3306
    --service-name=cookbook
    --home-directory=/home/tungsten/installs/cookbook
    --cluster-hosts=host1,host2,host3,host4
    --datasource-mysql-conf=/etc/my.cnf
    --datasource-log-directory=/var/lib/mysql
    --rmi-port=10000
    --thl-port=2112 -a
    --auto-enable=false
    --start

$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/trepctl -port 10000 -host host1 online -from-event mysql-bin.000005:651
$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/trepctl -port 10000 -host host2 online
$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/trepctl -port 10000 -host host3 online
$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/trepctl -port 10000 -host host4 online
--------------------------------------------------------------------------------------
Topology: 'MASTER_SLAVE'
--------------------------------------------------------------------------------------
# node host1
cookbook  [master]  seqno:          5  - latency:   0.556 - ONLINE

# node host2
cookbook  [slave]   seqno:          5  - latency:   0.663 - ONLINE

# node host3
cookbook  [slave]   seqno:          5  - latency:   0.690 - ONLINE

# node host4
cookbook  [slave]   seqno:          5  - latency:   0.595 - ONLINE

What happens here?

The first notable thing is that we install the replicator with the option --auto-enable set to false. With this option, the replicator starts, but stays OFFLINE. After that, the take-over script stops the replication in all servers, retrieves the latest binlog position, and tells the replicator to go ONLINE using the event ID. This is one of the few cases where we can't use a global transaction ID, because it does not exist yet!

Next, all the slaves go online. There is no need to tell them at which event they should start, because they will simply get the events from the master.

Further info:

Project home: http://tungsten-replicator.org
Discussion group: (a Google Group discussion on Tungsten Replicator)

Wednesday, March 27, 2013

Multi-master data conflicts - Part 2: dealing with conflicts

In the first part of this article we examined the types of conflicts and their causes. In this part, we will analyse some of the methods available to deal with conflicts.

Pessimistic locking (or: conflicts won't happen)

Applicability: synchronous clusters with 2pc

We've covered this topic in the previous article, but it's worth repeating. If you use a synchronous cluster, you don't have conflicts. For example, MySQL Cluster ensures consistent data with updates coming from different nodes. However, MySQL Cluster is not a replacement for a MySQL server, and it has severe limitations.

Optimistic locking

Applicability: synchronous clusters without 2pc (Galera)

Conflicting transactions proceed on different nodes with local locking. The last one then rolls back when it discovers a prior transaction got in first on the same data. For a more detailed analysis of this handling method, see this article by Jay Janssen

Conflict resolution after-the fact

Applicability: EnterpriseDB (none so far for MySQL)

Asynchronous replication is hard for conflicts. A conflict in this state means that the data has been applied to the wrong node or to the wrong object, and something must be done to solve the issue.

Typical remedies offered for conflict resolution are:

Earliest or Latest Timestamp: This method says that the oldest or the latest record prevails when a conflict happens. This is hardly a reliable resolution. It's the easiest method to implement, and thus it is offered. But it often results in a hidden data inconsistency problem, where we may find data that we don't expect. The current data was applied simply because it was updated later than the correct record. Also, timestamp calculation requires time synchronization across servers, and possibly across timezones, which calls for extra effort to keep the system functioning.
Node Priority: There is a hierarchy of nodes, with different ranks. When a conflict occurs, the node with the highest rank prevails. This method requires the data origin to be stored alongside the contents, and to be easily searchable when conflicts occur. It must also take into account offline nodes, and therefore it should keep the conflict resolution metadata until the offline nodes are back in synch.

Methods that could be implemented in a more advanced technology may include:

Origin enforcement: data coming from authorized nodes will be preserved. Data from wrong origin will be dropped, and a consolidation event will be generated and sent to the other nodes. This method would be possible in systems (like Tungsten) that keep track of the event origin.
Data merge: If possible and desirable, data from two different sources can be preserved, and merged in the destination table. This rule should also originate a new event to fix the data in the other nodes.

Schema renaming

Applicability: fan-in topologies

Fan in with likely conflicts

Image #1 - Fan-in topology with likely conflicts.

A fan-in topology is easy to implement with Tungsten Replicator, but not easy to maintain. By its nature, fan-in is a conflict waiting to happen. Assuming that all the masters have the same structure, they will replicate multiple changes into the same schema, and it is quite likely that some changes will clash. For this reason, the simple solution often adopted is renaming the schema before the data reaches the slave.

Fan in with schema renaming

Image #2 - Fan-in topology with schema renaming.

I know of at least one user who has successfully applied this technique for a cluster made of 70 masters and one slave.

Conflict prevention: Discipline

Applicability: all topologies

A simple way of preventing conflicts, and one that would make life easier for all is discipline. The organization decides which entry points can update which data, and conflicts are not possible, because the data is inserted or modified only in the places where it is supposed to be.

Multi master r w split

Image #3 - Preventing conflicts with discipline in a star topology.

Conflict prevention: Enforced discipline

Applicability: all topologies

If you have worked in any large organization, either public or private, you know that discipline alone is the worst method you can rely on for something so delicate and valuable as your data. The reasons why this paradigm could fail are many: it could be because some people dislike discipline, or because someone makes a mistake, or because there are too many rules and they don't remember, or because of an application bug that lets you update what you shouldn't.

Either way, you end up with a system that has conflicts and nobody knows what happened and how to fix them. However, there is a way of enforcing this system based on discipline.

This is the "poor-man's" conflict avoidance system. It is based on simple technology, available in most database servers. If you can install a multi-master topology, using either native MySQL (circular) replication or Tungsten Replicator topologies, you can also apply this method.

The key to the system is to grant different privileges for every master. Looking at image #3, you can enforce discipline by granting different privileges to the application user in every master.

In master #1, where we can update personnel, app_user will have SELECT privileges on all databases, and all privileges on personnel.

In master #2, where we can update sales, app_user will have all privileges on sales and read only access to the other databases, and so on.

The key to make this system work well is that you should assign the privileges and not let the GRANT statement being replicated. It should work like this:

# master 1
GRANT SELECT on *.* to app_user identified by 'my password';
# This is good for all masters. Let it replicate

# master 1
SET SQL_LOG_BIN=OFF;
GRANT ALL on personnel.* to app_user;   # This won't replicate


# master 2
SET SQL_LOG_BIN=OFF;
GRANT ALL on sales.* to app_user;

# master 3
SET SQL_LOG_BIN=OFF;
GRANT ALL on vehicles.* to app_user;

# master 4
SET SQL_LOG_BIN=OFF;
GRANT ALL on buildings.* to app_user;

This method works quite well. Since updates for a given schema can be applied only in one master, there is little chance of any mischief happening. Conflicts are not completely removed, though. There are super users and maintenance users who can, consciously or not, introduce errors. For these cases, you may want to look at the next section.

Enforced discipline with certified origin

Applicability: all Tungsten topologies

Discipline based on granted privileges is often robust enough for your needs. However, if you want to keep track of where the data comes from, you should look at a System Of Records technology, where the origin of each piece of data can be traced to its origin.

Tungsten Replicator implements this technology with several topologies. The theory of this matter is beautifully explained by Robert Hodges in an article written some time ago. Here I would like to look at the practical stuff.

To implement a System of Records in Tungsten, you decide where you want to update each schema (which is defined as a shard in our lingo,) assign that schema to a service, and the replicator will enforce your rules.

Once you have defined the shards, you can set the rules. When an event comes to a slave from an UNKNOWN shard, i.e. a shard that was not among the defined rules, you can:

Accept the event; (not recommended, really)
Drop the event silently
Drop the event with a warning in the logs;
Generate an error that will break replication (recommended)

You can choose among the above actions when setting a rule for events that come from UNWANTED shards, i.e. a shard that is not the one designated to update that schema.

Here's an example of a shard definition based on an all-masters schema with three nodes:

Conflict prevention 0

Image #4 - Sample conflict prevention in an all-masters topology

# Options to add during installation
--svc-extractor-filters=shardfilter

# policy for unknown shards
--property=replicator.filter.shardfilter.unknownShardPolicy=error

# policy for unwanted shards
--property=replicator.filter.shardfilter.unwantedShardPolicy=error

# Whether the policy for unwanted shards is activated or not
--property=replicator.filter.shardfilter.enforceHomes=false

# whether we allow whitelists to be created
--property=replicator.filter.shardfilter.allowWhitelisted=false


# Loading the rules set

$ trepctl -host host1 -service charlie shard -insert < shards.map

$ cat shards.map
shard_id          master      critical
employees         alpha       false
buildings         bravo       false
vehicles          charlie     false
test              whitelisted false

The rules are set by service, rather than host name. The schema 'employees' can be updated by the service named 'alpha', which has its master in host #1. Similarly, 'buildings' can be updated by 'bravo', with a master in host #2, and 'vehicles' is updated by 'charlie' master service in host #3. Remember that in Tungsten each replication stream from one master to many slaves is a separate service. This way we can keep track of the events origin. Even if the event is routed through a hub in a star topology, it retains its origin in the metadata.

The last line of the rules says that the schema 'test' is whitelisted, i.e. it can be freely updated by any master. And this means that conflicts can happen there, so be careful if you use this feature!

Conflict prevention right event1

Image #5 - Example of a legitimate event coming through

When an expected event comes through, all is well. Each node checks that the event was originated by the authorised master, and the event is applied to the slave service.

Conflict prevention wrong event0

Image #6 - Example of an event originated from an unauthorised node

When the event comes from a node that was not authorised, Tungsten looks at the rules for such case. In our setup, the rule says 'error', and therefore replication will break at the receiving end of the service 'bravo' in host #1 and host #3.

mysql #2> create table employees.nicknames( ... )


# Only server #2 creates the table
# slave service 'bravo' in host1 and host3 get an error
# No table is created in hosts #1 and #3

To detect the error, we can ask for the list of services in host #1 and host #3. What we will see is something like this.

#3 $ trepctl services | simple_services 
alpha    [slave]
seqno:          7  - latency:   0.136 - ONLINE

bravo    [slave]
seqno:         -1  - latency:  -1.000 - OFFLINE:ERROR

charlie  [master]
seqno:         66  - latency:   0.440 - ONLINE

This Listing says that replication was stopped with an error in slave service 'bravo'. To determine what happened exactly, we ask for the status of that service:

#3 $  trepctl -service bravo status
NAME                     VALUE
----                     -----
appliedLastEventId     : NONE
appliedLastSeqno       : -1
appliedLatency         : -1.0
(...)
offlineRequests        : NONE
pendingError           : Stage task failed: q-to-dbms
pendingErrorCode       : NONE
pendingErrorEventId    : mysql-bin.000002:0000000000001241;0
pendingErrorSeqno      : 7
pendingExceptionMessage: Rejected event from wrong shard: 
seqno=7 shard ID=employees shard master=alpha service=bravo
(...)

This status gives us quite a lot of useful information:

The event with Global transaction ID (GTID) # 7 was rejected;
The reason for rejection was because it came from the wrong shard;
The expected shard master (i.e. the authorized service) was alpha;
The event was instead originated from service bravo.

With the above information, we can take action to fix the event. We know that GTID 7 is wrong, so we can skip it in both servers where the error occurred. To clean up the error, we can simply generate the correct event in the authorized master

#host #1 
$ trepctl -service bravo online -skip-seqno 7

mysql #1> drop table if exists employees.nicknames;
mysql #1> create table if exists employees.nicknames ( ... ) ;

#3 $ trepctl -service bravo online -skip-seqno 7

Statement-based vs row-based replication

As a general note about conflict solving, I need to mention that, in most cases, using row-based replication vs. statement based will help identifying conflicts, making them easier to clean up.

Even when the conflict involves a deleted row, row-based events will contain enough information that will allow us to identifying the critical data needed to recover information.

Be aware that, if you use binlog-row-image=minimal in MySQL 5.6, the binary log entry for a DELETE event will only include the primary key.

More about filters

We have seen at least in two examples (server renaming and conflict prevention) that you can help avoid conflicts with filters. This is a powerful feature that should be taken into account when planning a multi-master topology.

MySQL native replication offers very little in matter of data transformation through filtering. Tungsten Replicator, instead, allows you to define filters at several stages of the replication process: when extracting the data, after transporting it to the slaves, before applying it. You can write your own filters in JavaScript, and do with the data pretty much everything you want. If you have creative ideas about solving conflicts by manipulating data in transit, there is a good chance that you can implement them using filters. This topic deserves more than a paragraph, and probably I will come back to it soon with a full fledged article.

Parting thoughts

Multi master topologies are much coveted features. However, they often introduce the risk of conflicts.

Dealing with conflicts becomes somewhat easier if you understand how they happen and what kind of problems they generate.

There is no silver bullet solution for conflicts, but recent technology and good organization can help you ease the pain.

Wednesday, August 19, 2015

Introducing star topology.

Monday, August 17, 2015

Introduction to P2P all-masters topology

Friday, August 14, 2015

Introduction: where we stand

Thursday, July 18, 2013

Multi master topologies blues

Meet The TPM

More topologies: fan-in

And a star

More TPM: building complex clusters

Extending clusters

More is coming

Related posts

Monday, April 22, 2013

Switching roles

Under load

Backup

Restore

Parallel replication

Intro

Preparing for installation

Validating your nodes

DRY-RUN installation

Installing a master-slave topology

Simple replication administration

Replication tools

Uninstalling Tungsen

Getting replication status

Logs

Reading events

Skipping transactions

Taking over existing Replication

Wednesday, March 27, 2013

Pessimistic locking (or: conflicts won't happen)

Optimistic locking

Conflict resolution after-the fact

Schema renaming

Conflict prevention: Discipline

Conflict prevention: Enforced discipline

Enforced discipline with certified origin

Statement-based vs row-based replication

More about filters

Parting thoughts

Pages

Giuseppe Maxia

Like my posts? Buy me a book!

Julian's force

Yes, I am one of those

Sapere, saper fare, fare, far sapere

Blog Archive

Planet MySQL