Switching roles
To get a taste of the power of Tungsten Replicator, we will show how to switch roles. This is a controlled operation (as opposed to fail-over), where we can decide when to switch and which nodes are involved.
In our topology, host1 is the master, and we have three slaves. We can either ask for a switch and let the script select the first available slave, or tell the script which slave should be promoted. The script will show us the steps needed to perform the operation.
IMPORTANT! Please note that this operation is not risk free. Tungsten replicator is a simple replication system, not a complete management tool like Continuent Tungsten. WIth the replicator, you must make sure that the applications have stopped writing to the master before starting the switch, and then you should address the application to the new master when the operation is done.
$ cookbook/switch host2 # Determining current roles host1 master host2 slave host3 slave host4 slave # Will promote host2 to be the new master server # Waiting for slaves to catch up and pausing replication trepctl -host host2 wait -applied 5382 trepctl -host host2 offline trepctl -host host3 wait -applied 5382 trepctl -host host3 offline trepctl -host host4 wait -applied 5382 trepctl -host host4 offline trepctl -host host1 offline # Reconfiguring server roles and restarting replication trepctl -host host2 setrole -role master trepctl -host host2 online trepctl -host host1 setrole -role slave -uri thl://host2:2112 trepctl -host host1 online trepctl -host host3 setrole -role slave -uri thl://host2:2112 trepctl -host host3 online trepctl -host host4 setrole -role slave -uri thl://host2:2112 trepctl -host host4 online -------------------------------------------------------------------------------------- Topology: 'MASTER_SLAVE' -------------------------------------------------------------------------------------- # node host1 cookbook [slave] seqno: 5,384 - latency: 2.530 - ONLINE # node host2 cookbook [master] seqno: 5,384 - latency: 2.446 - ONLINE # node host3 cookbook [slave] seqno: 5,384 - latency: 2.595 - ONLINE # node host4 cookbook [slave] seqno: 5,384 - latency: 2.537 - ONLINE
As you can see from the listing above, The script displays the steps for the switch, using trepctl as a centralized tool.
Under load
After the simple installation in Part 1, we saw that we can test the flow of replication using 'cookbook/test_cluster'. That's a very simple set of operations that merely checks if replication is working. If we want to perform more serious tests, we should apply a demanding load to the replication system.
If you don't have applications that can exercise the servers to your liking, you should be pleased to know that Tungsten Replicator ships with a built-in application for data loading and benchmarking. Inside the expanded tarball, there is a directory named bristlecone, containing the software for such testing tools. There is a detailed set of instructions under './bristlecone/doc'. For the impatient, there is a cookbook recipe that starts a reasonable load with a single command:
$ cookbook/load_data start # Determining current roles # Evaluator started with pid 28370 # Evaluator details are available at /home/tungsten/installs/cookbook/tungsten/load/host1/evaluator.job # Evaluator output can be monitored at /home/tungsten/installs/cookbook/tungsten/load/host1/evaluator.log $ cat /home/tungsten/installs/cookbook/tungsten/load/host1/evaluator.job Task started at : Sun Apr 7 18:20:00 2013 Task started from : /home/tungsten/tinstall/current Executable : /home/tungsten/installs/cookbook/tungsten/bristlecone/bin/evaluator.sh Process ID : 28370 Using : /home/tungsten/installs/cookbook/tungsten/load/host1/evaluator.xml Process id : /home/tungsten/installs/cookbook/tungsten/load/host1/evaluator.pid Log : /home/tungsten/installs/cookbook/tungsten/load/host1/evaluator.log Database : host1 Table prefix : tbl Host : host1 Port : 3306 User : tungsten Test duration : 3600 $ tail /home/tungsten/installs/cookbook/tungsten/load/host1/evaluator.log 18:22:18,672 INFO 1365351738672 10/10 5035.0 ops/sec 0 ms/op 28380 rows/select 41 updates 54 deletes 166 inserts 18:22:20,693 INFO 1365351740693 10/10 4890.0 ops/sec 0 ms/op 26746 rows/select 57 updates 37 deletes 144 inserts 18:22:22,697 INFO 1365351742697 10/10 4986.0 ops/sec 0 ms/op 28183 rows/select 59 updates 46 deletes 162 inserts 18:22:24,716 INFO 1365351744716 10/10 5208.0 ops/sec 0 ms/op 29067 rows/select 51 updates 51 deletes 171 inserts 18:22:26,736 INFO 1365351746736 10/10 4856.0 ops/sec 0 ms/op 27695 rows/select 46 updates 68 deletes 141 inserts 18:22:28,739 INFO 1365351748739 10/10 5022.0 ops/sec 0 ms/op 28269 rows/select 51 updates 58 deletes 145 inserts 18:22:30,758 INFO 1365351750758 10/10 4893.0 ops/sec 0 ms/op 28484 rows/select 47 updates 50 deletes 165 inserts 18:22:32,777 INFO 1365351752777 10/10 4501.0 ops/sec 0 ms/op 26481 rows/select 42 updates 52 deletes 130 inserts 18:22:34,781 INFO 1365351754781 10/10 5057.0 ops/sec 0 ms/op 30450 rows/select 58 updates 53 deletes 157 inserts 18:22:36,801 INFO 1365351756801 10/10 5087.0 ops/sec 0 ms/op 30845 rows/select 55 updates 56 deletes 156 inserts
What happens here?
The evaluator process is started using a file named 'evaluator.xml,' which is generated dynamically. The cookbook recipe detects which is the current master in the replication system and directs the operations there (in our case, it's 'host1'). The same task takes note of the process ID, which will be used to stop the evaluator when done, and the output is sent to a file, where you can look at it if needed.
Looking at evaluator.log, you can see that there are quite a lot of operations going on. Most of them are read queries, as the application was designed to solicit a database server as much as possible. Nonetheless, there are quite a lot of update operations, as a call to 'show_cluster' can confirm.
$ cookbook/show_cluster -------------------------------------------------------------------------------------- Topology: 'MASTER_SLAVE' -------------------------------------------------------------------------------------- # node host1 cookbook [master] seqno: 30,292 - latency: 0.566 - ONLINE # node host2 cookbook [slave] seqno: 30,277 - latency: 0.531 - ONLINE # node host3 cookbook [slave] seqno: 30,269 - latency: 0.511 - ONLINE # node host4 cookbook [slave] seqno: 30,287 - latency: 0.550 - ONLINE
The load will continue for one hour (unless you defined a different duration). SHould you want to stop it before that period, you can run:
$ cookbook/load_data stop # Determining current roles # Stopping Evaluator at pid 28370
One important piece of information about this load application is that it looks for the masters in your cluster, and starts a load in every master. This is useful if you want to test a multi-master topology, as the ones we will see in another article.
If the default behavior of load_data is not what you expect, you can further customize the load by fine tuning the application launcher. First, you run 'load_data' with the print option:
$ cookbook/load_data print # Determining current roles $HOME/installs/cookbook/tungsten/bristlecone/bin/concurrent_evaluator.pl \ --deletes=1 \ --updates=1 \ --inserts=3 \ --test-duration=3600 \ --host=host1 \ --port=3306 \ -u tungsten \ -p secret \ --continuent-root=/home/tungsten/installs/cookbook \ -d host1 \ -s /home/tungsten/installs/cookbook/tungsten/load/host1 start
Then, you can copy and paste the resulting command, and eventually run the concurrent_evaluator script with your additions.
There are many options available. The manual is embedded in the application itself:
$ ./bristlecone/bin/concurrent_evaluator.pl --manual
An important option that we can use is --instances=N. This option will launch concurrently the evaluator N times, each time using a different schema. We will use this option to test parallel replication.
Backup
I am not going to stress here how important backups are. I assume (perhaps foolishly) that everyone reading this article knows why. Instead, I want to show how Tungsten Replicator supports backup and restore as integrated methods.
When you install Tungsten, you can add options to select a backup method and fine tune its behavior.
$ ./tools/tungsten-installer --help-master-slave -a |grep backup --backup-directory Permanent backup storage directory [$TUNGSTEN_HOME/backups] This directory should be accessible by every replicator to ensure simple operation of backup and restore. --backup-method Database backup method (none|mysqldump|xtrabackup|script) [xtrabackup-full] Tungsten integrates with a variety of backup mechanisms. We strongly recommend you configure one of these to help with provisioning servers. Please consult the Tungsten Replicator Guide for more information on backup configuration. --backup-dump-directory Backup temporary dump directory [/tmp] --backup-retention Number of backups to retain [3] --backup-script What is the path to the backup script --backup-command-prefix Use sudo when running the backup script? [false] --backup-online Does the backup script support backing up a datasource while it is ONLINE [false]
First off, the default directory for backups is under your installation directory ($TUNGSTEN_HOME/backups). If you want to take backups through Tungsten, you must make sure that there is enough storage in that path to hold at least one backup. Tungsten will keep up to three backups in that directory, but you can define this action differently.
Second, the default backup method is 'mysqldump,' not because it is recommended, but because it is widely available. As you probably know, though, if your database is more than a few dozen GB, mysqldump is not an adequate method.
Tungsten Replicator provides support for xtrabackup. If xtrabackup is installed in your servers, you can define it as your default backup method. When you are installing a new cluster, you can do this:
$ export MORE_OPTIONS='-a --backupmethod=xtrabackup --backup-command-prefix=true' $ cookbook/install_master_slave
If you have just installed and need to reconfigure, you can call 'configure_service' to accomplish the task:
$ cookbook/configure_service -U -a --backup-method=xtrabackup --backup-command-prefix=true cookbook
(Where 'cookbook' is the service name). VERY IMPORTANT: configure_service acts on a single host, and by default it acts on the current host, unless you say otherwise. For example:
$ cookbook/configure_service -U --host=host2 -a --backup-method=xtrabackup --backup-command-prefix=true cookbook
You will have to restart the replicator in node 'host2' for the changes to take effect.
$ ssh host2 "cd $TUNGSTEN_BASE/tungsten/ ; ./cookbook/replicator restart"
Using the backup is quite easy. You only need to call 'trepctl', indicate in which host you want to take a backup, and Tungsten will do the rest.
$ cookbook/trepctl -host host3 backup Backup completed successfully; URI=storage://file-system/store-0000000001.properties $ cookbook/trepctl -host host2 backup Backup completed successfully; URI=storage://file-system/store-0000000001.properties
Apparently, we have two backups with the same contents, taken from two different nodes. However, since we have changed the backup method for host2, we will have a mysqldump small file for host3, and a rather larger xtrabackup file for host2. Again, the cookbook has a method that shows the backups that are available in all the nodes:
$ ./cookbook/backups backup-agent : (service: cookbook) mysqldump backup-dir : (service: cookbook) /home/tungsten/installs/cookbook/backups/cookbook # [node: host1] 0 files found # [node: host2] 3 files found ++ /home/tungsten/installs/cookbook/backups/cookbook total 2.4G -rw-r--r-- 1 tungsten tungsten 72 Apr 7 21:52 storage.index -rw-r--r-- 1 tungsten tungsten 2.4G Apr 7 21:52 store-0000000001-full_xtrabackup_2013-04-07_21-50_59.tar -rw-r--r-- 1 tungsten tungsten 323 Apr 7 21:52 store-0000000001.properties drwxr-xr-x 2 tungsten tungsten 4.0K Apr 7 21:52 xtrabackup # [node: host3] 3 files found ++ /home/tungsten/installs/cookbook/backups/cookbook total 6.3M -rw-r--r-- 1 tungsten tungsten 72 Apr 7 21:50 storage.index -rw-r--r-- 1 tungsten tungsten 6.3M Apr 7 21:50 store-0000000001-mysqldump_2013-04-07_21-50_28.sql.gz -rw-r--r-- 1 tungsten tungsten 315 Apr 7 21:50 store-0000000001.properties # [node: host4] 0 files found
WARNING: This example was here only to show how to change the backup method. It is NOT recommended to have mixed methods for backups in different nodes. Unless you have a specific need, and understand the consequence of this choice, you should have the same backup method everywhere.
Restore
A backup is only good if you can use to restore your data. Using the same method shown to take a backup, you can restore your data. For this example, let's use mysqldump in all nodes (just because it's quicker), and show the operations for a backup and restore.
First, we take a backup in node 'host3', and then we will restore the data in 'host2'.
$ cookbook/trepctl -host host3 backup Backup completed successfully; URI=storage://file-system/store-0000000001.properties $ cookbook/backups backup-agent : (service: cookbook) mysqldump backup-dir : (service: cookbook) /home/tungsten/installs/cookbook/backups/cookbook # [node: host1] 0 files found # [node: host2] 0 files found # [node: host3] 3 files found ++ /home/tungsten/installs/cookbook/backups/cookbook total 6.2M -rw-r--r-- 1 tungsten tungsten 72 Apr 7 22:05 storage.index -rw-r--r-- 1 tungsten tungsten 6.1M Apr 7 22:05 store-0000000001-mysqldump_2013-04-07_22-05_43.sql.gz -rw-r--r-- 1 tungsten tungsten 315 Apr 7 22:05 store-0000000001.properties # [node: host4] 0 files found
Now, we have the backup files in host3, but we have an issue in host2, and we need to take a restore there. Assuming that the database server is unusable (this is usually the case when we must take a restore), we have the unpleasant situation where the backups are in one node, and we need to use in another. In a well organized environment, we would have a shared storage for the backup directory, and thus we could just move ahead and perform our restore. In this case, though, we have no such luxury. Then, we use yet another feature of the cookbook:
$ cookbook/copy_backup syntax: copy_backup SERVICE SOURCE_NODE DESTINATION_NODE $ cookbook/copy_backup cookbook host3 host2 # No message = success $ cookbook/backups backup-agent : (service: cookbook) mysqldump backup-dir : (service: cookbook) /home/tungsten/installs/cookbook/backups/cookbook # [node: host1] 0 files found # [node: host2] 3 files found ++ /home/tungsten/installs/cookbook/backups/cookbook total 6.2M -rw-r--r-- 1 tungsten tungsten 72 Apr 7 22:05 storage.index -rw-r--r-- 1 tungsten tungsten 6.1M Apr 7 22:05 store-0000000001-mysqldump_2013-04-07_22-05_43.sql.gz -rw-r--r-- 1 tungsten tungsten 315 Apr 7 22:05 store-0000000001.properties # [node: host3] 3 files found ++ /home/tungsten/installs/cookbook/backups/cookbook total 6.2M -rw-r--r-- 1 tungsten tungsten 72 Apr 7 22:05 storage.index -rw-r--r-- 1 tungsten tungsten 6.1M Apr 7 22:05 store-0000000001-mysqldump_2013-04-07_22-05_43.sql.gz -rw-r--r-- 1 tungsten tungsten 315 Apr 7 22:05 store-0000000001.properties # [node: host4] 0 files found
The 'copy_backup' command has copied the files from one host to another, and now we are ready to perform a restore in host2.
$ cookbook/trepctl -host host2 restore Operation failed: Restore operation failed: Operation irrelevant in current state
Hmm. Probably not the friendliest of error messages. What this scoundrel means is that it can't perform a restore when the replicator is online.
$ cookbook/trepctl -host host2 offline $ cookbook/trepctl -host host2 restore Restore completed successfully $ cookbook/trepctl -host host2 services Processing services command... NAME VALUE ---- ----- appliedLastSeqno: 17955 appliedLatency : 0.407 role : slave serviceName : cookbook serviceType : local started : true state : ONLINE Finished services command...
The restore operation was successful. We could have used xtrabackup just as well. THe only difference is that the operation takes way longer.
Parallel replication
Slave lagging is a common occurrence in MySQL replication. Most of the time, the reason for this problem is that while the master updates data using many threads concurrently, the slave applies the replication stream using a single thread. In Tungsten there is a built-in feature that applies changes in parallel, when the updates are happening in separate schemas. For database servers that are sharded by database or for the ones the serve multi-tenancy application, this is an ideal case. It is likely that the action happens in several schemas at once, and thus Tungsten can parallelize the changes successfully. Notice, however, that if you are running operations using a single schema, parallel replication won't give you any relief. Also, the operations must be really independent from each other. If a schema has foreign keys that reference to another schema, or if a transaction mixes data from two or more schemas, Tungsten will stop parallelizing and resume working in single thread until the end of the unclean operation, resulting in an overall decrease of performance, instead of increase.
To activate parallel replication, you need to enable two options:
- --channels=N where you indicate how many parallel threads you want to establish. You should indicate as many channels as the number of schemas where you are operating. Some benchmarks will help you find the limits. Defining too many schemas will eventually exhaust the system resources. If the number of schemas is larger than the channels, Tungsten will use the channels in a round-robin fashion.
- --svc-parallelization-type=disk: This option will activate a fast queue-creation algorithm that acts directly to the THL files. Contrary to common perception, where one would believe that in-memory queues are faster, this method is very efficient and less likely to exhaust system resources.
If you want to install all the servers with parallel replication, you can do this:
$ export MORE_OPTIONS='-a --channels=5 --svc-parallelization-type=disk' $ cookbook/install_master_slave
If you need parallel replication only on one particular slave service, you can enable parallel replication there, using 'configure_service', same as we have seen before for the backup-method.
In this example, we're going to use the second method
$ cookbook/configure_service -U -a --host=host4 --channels=5 --svc-parallelization-type=disk cookbook WARN >> host4 >> THL schema tungsten_cookbook already exists at tungsten@host4:3306 (WITH PASSWORD) NOTE >> host4 >> Deployment finished $ cookbook/replicator restart Stopping Tungsten Replicator Service... Stopped Tungsten Replicator Service. Starting Tungsten Replicator Service...
Now parallel replication is enabled. But how do we make sure that the service has indeed been enhanced?
The quickest method is to check the Tungsten service schema. Every replicator service creates a database schema named 'tungsten_$SERVICE_NAME', where it stores the current replication status. For example, in our default system, where the only replication service is called 'cookbook', we will find a schema named 'tungsten_cookbook'. The table that we want to inspect is one named 'trep_commit_seqno', where we store the global transaction ID, the schema where the transaction was applied, the data origin, and the time stamps at extraction and apply time. What is relevant in this table is that there will be one record for each channel that we have enabled. Thus, in host2 and host3 there will be only one line, while in host4 we should find 5 lines.
There is one useful recipe to get this result at once:
$ cookbook/query_all_nodes 'select count(*) from tungsten_cookbook.trep_commit_seqno' +----------+ | count(*) | +----------+ | 1 | +----------+ +----------+ | count(*) | +----------+ | 1 | +----------+ +----------+ | count(*) | +----------+ | 1 | +----------+ +----------+ | count(*) | +----------+ | 5 | +----------+
Right! So we have 5 channels. Before inspecting what is going on in these channels, let's apply some load. You may recall that our load_data script can show you a command that we can customize for our purpose.
$ cookbook/load_data print /home/tungsten/installs/cookbook/tungsten/bristlecone/bin/concurrent_evaluator.pl \ --deletes=1 \ --updates=1 \ --inserts=3 \ --test-duration=3600 \ --host=host1 \ --port=3306 \ -u tungsten \ -p secret \ --continuent-root=/home/tungsten/installs/cookbook \ -d host1 \ -s /home/tungsten/installs/cookbook/tungsten/load/host1 start
We just copy-and-paste this command, adding --instances=5 at the end, and we get 5 messages indicating that an evaluator was started. Let's see:
$ cookbook/query_node host4 'show schemas' +--------------------+ | Database | +--------------------+ | information_schema | | host11 | | host12 | | host13 | | host14 | | host15 | | mysql | | test | | tungsten_cookbook | +--------------------+
Since we indicated that the database was to be named 'host1' and we have asked for 5 instances, the evaluator has created host11. host12, and so on.
Now that there is some action, we can have a look at our replication. Rather than querying the database directly, asking for the contents of trep_commit_seqno, we use another cookbook recipe:
$ cookbook/tungsten_service all # node: host1 - service: cookbook +--------+-----------+-----------------+----------+---------------------+---------------------+ | seqno | source_id | applied_latency | shard_id | update_timestamp | extract_timestamp | +--------+-----------+-----------------+----------+---------------------+---------------------+ | 324246 | host1 | 1 | host12 | 2013-04-07 23:02:16 | 2013-04-07 23:02:15 | +--------+-----------+-----------------+----------+---------------------+---------------------+ # node: host2 - service: cookbook +--------+-----------+-----------------+----------+---------------------+---------------------+ | seqno | source_id | applied_latency | shard_id | update_timestamp | extract_timestamp | +--------+-----------+-----------------+----------+---------------------+---------------------+ | 324383 | host1 | 0 | host13 | 2013-04-07 23:02:16 | 2013-04-07 23:02:16 | +--------+-----------+-----------------+----------+---------------------+---------------------+ # node: host3 - service: cookbook +--------+-----------+-----------------+----------+---------------------+---------------------+ | seqno | source_id | applied_latency | shard_id | update_timestamp | extract_timestamp | +--------+-----------+-----------------+----------+---------------------+---------------------+ | 324549 | host1 | 0 | host13 | 2013-04-07 23:02:16 | 2013-04-07 23:02:16 | +--------+-----------+-----------------+----------+---------------------+---------------------+ # node: host4 - service: cookbook +--------+-----------+-----------------+----------+---------------------+---------------------+ | seqno | source_id | applied_latency | shard_id | update_timestamp | extract_timestamp | +--------+-----------+-----------------+----------+---------------------+---------------------+ | 324740 | host1 | 0 | host11 | 2013-04-07 23:02:16 | 2013-04-07 23:02:16 | | 324736 | host1 | 0 | host12 | 2013-04-07 23:02:16 | 2013-04-07 23:02:16 | | 324739 | host1 | 0 | host13 | 2013-04-07 23:02:16 | 2013-04-07 23:02:16 | | 324737 | host1 | 0 | host14 | 2013-04-07 23:02:16 | 2013-04-07 23:02:16 | | 324735 | host1 | 0 | host15 | 2013-04-07 23:02:16 | 2013-04-07 23:02:16 | +--------+-----------+-----------------+----------+---------------------+---------------------+
Here you see that host1 has only one channel: it is the master, and it must serialize according to the binary log. Slaves host2 and host3 have only one channel, because we have enabled parallel replication only in host4. And finally we see that in host4 there are 5 channels, each showing a different shard_id (= database schema), with its own transaction ID being applied. This shows that replication is working.
Tungsten Replicator has, however, several tools that help monitoring parallel replication:
$ cookbook/trepctl -host host4 status -name stores Processing status command (stores)... NAME VALUE ---- ----- activeSeqno : 475395 doChecksum : false flushIntervalMillis : 0 fsyncOnFlush : false logConnectionTimeout : 28800 logDir : /home/tungsten/installs/cookbook/thl/cookbook logFileRetainMillis : 604800000 logFileSize : 100000000 maximumStoredSeqNo : 475449 minimumStoredSeqNo : 0 name : thl readOnly : false storeClass : com.continuent.tungsten.replicator.thl.THL timeoutMillis : 2147483647 NAME VALUE ---- ----- criticalPartition : -1 discardCount : 0 estimatedOfflineInterval: 0.0 eventCount : 457459 headSeqno : 475415 intervalGuard : AtomicIntervalGuard (array is empty) maxDelayInterval : 60 maxOfflineInterval : 5 maxSize : 10 name : parallel-queue queues : 5 serializationCount : 0 serialized : false stopRequested : false store.0 : THLParallelReadTask task_id=0 thread_name=store-thl-0 hi_seqno=475415 lo_seqno=17957 read=457459 accepted=93357 discarded=364102 events=0 store.1 : THLParallelReadTask task_id=1 thread_name=store-thl-1 hi_seqno=475415 lo_seqno=17957 read=457459 accepted=92567 discarded=364892 events=0 store.2 : THLParallelReadTask task_id=2 thread_name=store-thl-2 hi_seqno=475415 lo_seqno=17957 read=457459 accepted=91197 discarded=366262 events=0 store.3 : THLParallelReadTask task_id=3 thread_name=store-thl-3 hi_seqno=475415 lo_seqno=17957 read=457459 accepted=90492 discarded=366967 events=0 store.4 : THLParallelReadTask task_id=4 thread_name=store-thl-4 hi_seqno=475415 lo_seqno=17957 read=457459 accepted=89846 discarded=367613 events=0 storeClass : com.continuent.tungsten.replicator.thl.THLParallelQueue syncInterval : 10000 Finished status command (stores)...
This command shows the status of parallel replication in each channels. Notable information in this screen:
- eventCount is the number of transaction being processed
- serializationCount:0 means that all events have been parallelized, and there was no need to serialize any.
- 'read' ... 'accepted' ... 'discarded' are the operation in the disk queue. Each channel parses all the events, and queues only the ones that belong in its shard.
$ cookbook/trepctl -host host3 status -name shards Processing status command (shards)... ... NAME VALUE ---- ----- appliedLastEventId: mysql-bin.000006:0000000169567337;0 appliedLastSeqno : 660707 appliedLatency : 1.314 eventCount : 130325 shardId : host11 stage : q-to-dbms NAME VALUE ---- ----- appliedLastEventId: mysql-bin.000006:0000000169566006;0 appliedLastSeqno : 660702 appliedLatency : 1.312 eventCount : 129747 shardId : host12 stage : q-to-dbms ...
This command (only a portion reported here) displays the status of each shard, showing for each one which event, transaction ID and event count were recorded.
There should be much more to mention about the monitoring tools, but for now I want just to mention a last important point. When the replicator goes offline, parallel replication stops, and the replication operations are consolidated into a single thread. This makes sure that replication can later resume using a single thread, or it can be safely handed over to native MySQL replication. This behavior also makes sure that a slave can be safely promoted to master. A switch operation requires that the slave service be offline before being reconfigured to become a master. When the replicator goes offline, the N channels become 1.
$ ./cookbook/trepctl offline $ cookbook/tungsten_service all # node: host1 - service: cookbook +--------+-----------+-----------------+----------+---------------------+---------------------+ | seqno | source_id | applied_latency | shard_id | update_timestamp | extract_timestamp | +--------+-----------+-----------------+----------+---------------------+---------------------+ | 769652 | host1 | 0 | host12 | 2013-04-07 23:18:07 | 2013-04-07 23:18:07 | +--------+-----------+-----------------+----------+---------------------+---------------------+ # node: host2 - service: cookbook +--------+-----------+-----------------+----------+---------------------+---------------------+ | seqno | source_id | applied_latency | shard_id | update_timestamp | extract_timestamp | +--------+-----------+-----------------+----------+---------------------+---------------------+ | 769699 | host1 | 0 | host13 | 2013-04-07 23:18:08 | 2013-04-07 23:18:08 | +--------+-----------+-----------------+----------+---------------------+---------------------+ # node: host3 - service: cookbook +--------+-----------+-----------------+----------+---------------------+---------------------+ | seqno | source_id | applied_latency | shard_id | update_timestamp | extract_timestamp | +--------+-----------+-----------------+----------+---------------------+---------------------+ | 769866 | host1 | 0 | host15 | 2013-04-07 23:18:08 | 2013-04-07 23:18:08 | +--------+-----------+-----------------+----------+---------------------+---------------------+ # node: host4 - service: cookbook +--------+-----------+-----------------+----------+---------------------+---------------------+ | seqno | source_id | applied_latency | shard_id | update_timestamp | extract_timestamp | +--------+-----------+-----------------+----------+---------------------+---------------------+ | 767064 | host1 | 0 | host15 | 2013-04-07 23:18:01 | 2013-04-07 23:18:01 | +--------+-----------+-----------------+----------+---------------------+---------------------+
If we put it back online, we see again the channels expanding.
Further info:
- Project home: http://tungsten-replicator.org
- Discussion group: (a Google Group discussion on Tungsten Replicator)
1 comment:
HI,
I am facing a tungsten mysql replication Bug error. What's the right forum to post all the details?
Thanks
Post a Comment