Showing posts with label innovation. Show all posts
Showing posts with label innovation. Show all posts

Friday, November 04, 2011

Replication stars

Working with replication, you come across many topologies, some of them sound and established, some of them less so, and some of them still in the realm of the hopeless wishes. I have been working with replication for almost 10 years now, and my wish list grew quite big during this time. In the last 12 months, though, while working at Continuent, some of the topologies that I wanted to work with have moved from the cloud of wishful thinking to the firm land of things that happen. My quest for star replication starts with the most common topology. One master, many slaves.
Replication 1 master slave

Fig 1. Master/Slave topology

Replication legend

Legend

It looks like a star, with the rays extending from the master to the slaves. This is the basis of most of the replication going on mostly everywhere nowadays, and it has few surprises. Setting aside the problems related to failing over and switching between nodes, which I will examine in another post, let's move to another star.
Replication 2 fan in slave

Fig 2. Fan-in slave, or multiple sources

The multiple source replication, also known as fan-in topology, has several masters that replicate to the same slave. For years, this has been forbidden territory for me. But Tungsten Replicator allows you to create multiple source topologies easily. This is kind of uni-directional, though. I am also interested in topologies where I have more than one master, and I can retrieve data from multiple points.
Replication 3 all to all three nodes

Fig 3. all-to-all three nodes

Replication 4 all to all four nodes

Fig 4. All-to-all four nodes

Tungsten Multi-Master Installation solves this problem. It allows me to create topologies where every node replicates to every other node. Looking at the three-node scheme, it appears a straightforward solution. When we add one node, though, we see that the amount of network traffic grows quite a lot. The double sided arrows mean that there is a replication service at each end of the line, and two open data channels. When we move from three nodes to four, we double the replication services and the channels needed to sustain the scheme. For several months, I was content with this. I thought: it is heavy, but it works, and it's way more than what you can do with native replication, especially if you consider that you can have a practical way of preventing conflicts using Shard Filters. But that was not enough. Something kept gnawing at me, and from time to time I experimented with Tungsten Replicator huge flexibility to create new topologies. But the star kept eluding me. Until … Until, guess what? a customer asked for it. The problem suddenly ceased to be a personal whim, and it became a business opportunity. Instead of looking at the issue in the idle way I often think about technology, I went at it with practical determination. What failed when I was experimenting in my free time was that either the pieces did not glue together the way I wanted, or I got an endless loop. Tungsten Replicator has a set of components that are conceptually simple. You deploy a pipeline between two points, open the tap, and data starts flowing in one direction. Even with multiple masters replication, the principle is the same. You deploy many pipes, and each one has one purpose only.
Replication 5 star topology 3 rays

Fig 5. All-masters star topology

In the star topology, however, you need to open more taps, but not too many, as you need to avoid the data looping around. The recipe, as it turned out, is to create a set of bi-directional replication systems, where you enable the central node slave services to get changes only from a specific master, and the slave services on the peripheral nodes to accept changes from any master. It was as simple as that. There are, of course, benefits and drawbacks with a star topology, compared to a all-replicate-to-all design. In the star topology, we create a single point of failure. If the central node fails, replication stops, and the central node needs to be replaced. Instead, the all-to-all design has no weaknesses. Its abundance of connections makes sure that, if a node fails, the system continues working without any intervention. There is no need for fail-over.
Replication 6 all to all extending png

Fig 6. extending an all-to-all topology

Replication 7 star extending

Fig 7. Extending a star topology

However, there is a huge benefit in the node management. If you need to add a new node, it costs two services and two connections, while the same operation in the all-to-all replication costs 8 services and 8 connections. With the implementation of this topology, a new challenge has arisen. While conflict prevention by sharding is still possible, this is not the kind of scenario where you want to apply it. We have another conflict prevention mechanism in mind, and this new topology is a good occasion make it happen. YMMV. I like the additional choice. There are cases where a all-replicate-to-all topology is still the best option, and there are cases where a star topology is more advisable.

Thursday, August 11, 2011

Usability improvements in Tungsten Replicator 2.0.4

If you love a software product, you should try to improve it, and not be afraid of criticizing it. This principle has guided me with MySQL (where I have submitted many usability bugs, and discussed interface with developers for years), and it proves true for Tungsten Replicator as well. When I started working at Continuent, while I was impressed by the technology, I found the installation procedure and the product logs quite discouraging. I would almost say disturbing. Fortunately, my colleagues have agreed on my usability focus, and we can enjoy some tangible improvements. I have already mentioned the new installation procedure, which requires just one command to install a full master/slave cluster. I would like to show how you can use the new installer to deploy a multiple source replication topology like the following: The first step is to install one master in each node. I can run the commands from node #4, which is the one that will eventually receive the updates from the remote masters, and where I need to install the slave services:
TUNGSTEN_BASE=$HOME/newinst
SERVICES=(alpha bravo charlie delta)
REPLICATOR=$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/replicator

for N in 1 2 3 4
do
    INDEX=$(($N-1))

  ./tools/tungsten-installer \
    --master-slave \
    --master-host=qa.r$N.continuent.com \
    --datasource-user=tungsten \
    --datasource-password=secret \
    --service-name=${SERVICES[$INDEX]} \
    --home-directory=$TUNGSTEN_BASE \
    --cluster-hosts=qa.r$N.continuent.com \
    --start-and-report
done
The above loop will install a master (remotely or locally) in the four servers. Then I need to create the slave services. To do it, I use the updated configure-service in the tools directory.
TUNGSTEN_TOOLS=$TUNGSTEN_BASE/tungsten/tools
COMMON_OPTIONS='-C -q 
    --local-service-name=delta 
    --role=slave 
    --service-type=remote 
    --allow-bidi-unsafe=true 
    --datasource=qa_r4_continuent_com' 

$TUNGSTEN_TOOLS/configure-service $COMMON_OPTIONS --master-host=qa.r1.continuent.com  alpha 
$TUNGSTEN_TOOLS/configure-service $COMMON_OPTIONS --master-host=qa.r2.continuent.com  bravo
$TUNGSTEN_TOOLS/configure-service $COMMON_OPTIONS --master-host=qa.r3.continuent.com  charlie 

$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/replicator restart
$TUNGSTEN_BASE/tungsten/tungsten-replicator/bin/trepctl services
These commands create the slave services locally in Delta. After restarting the replicator, a simple test will be creating something different in each master, and check that the data has replicated to the single slave. The latest improvement in matter of usability is the simplification of the replicator logs. Until a few days ago, if you had an error in the replicator, you would get a long list of not exactly helpful stuff. For example, if I create a table in a slave, and then create the same table in the master, I will break replication. The extended log would produce something like this:
INFO   | jvm 1    | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,216 [tsandbox - q-to-dbms-0] ERROR pipeline.SingleThreadStageTask Event application failed: seqno=1 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master
INFO   | jvm 1    | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,217 [tsandbox - Event dispatcher thread] ERROR management.OpenReplicatorManager Received error notification, shutting down services: Event application failed: seqno=1 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master
INFO   | jvm 1    | 2011/08/11 18:10:52 | com.continuent.tungsten.replicator.applier.ApplierException: java.sql.SQLException: Statement failed on slave but succeeded on master
INFO   | jvm 1    | 2011/08/11 18:10:52 |       at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:183)
INFO   | jvm 1    | 2011/08/11 18:10:52 |       at com.continuent.tungsten.replicator.applier.JdbcApplier.apply(JdbcApplier.java:1233)
INFO   | jvm 1    | 2011/08/11 18:10:52 |       at com.continuent.tungsten.replicator.applier.ApplierWrapper.apply(ApplierWrapper.java:101)
INFO   | jvm 1    | 2011/08/11 18:10:52 |       at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.runTask(SingleThreadStageTask.java:498)
INFO   | jvm 1    | 2011/08/11 18:10:52 |       at com.continuent.tungsten.replicator.pipeline.SingleThreadStageTask.run(SingleThreadStageTask.java:155)
INFO   | jvm 1    | 2011/08/11 18:10:52 |       at java.lang.Thread.run(Unknown Source)
INFO   | jvm 1    | 2011/08/11 18:10:52 | Caused by: java.sql.SQLException: Statement failed on slave but succeeded on master
INFO   | jvm 1    | 2011/08/11 18:10:52 |       at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:139)
INFO   | jvm 1    | 2011/08/11 18:10:52 |       ... 5 more
INFO   | jvm 1    | 2011/08/11 18:10:52 | Caused by: java.sql.SQLSyntaxErrorException: Table 't1' already exists
INFO   | jvm 1    | 2011/08/11 18:10:52 |       at org.drizzle.jdbc.internal.SQLExceptionMapper.get(SQLExceptionMapper.java:78)
INFO   | jvm 1    | 2011/08/11 18:10:52 |       at org.drizzle.jdbc.DrizzleStatement.executeBatch(DrizzleStatement.java:930)
INFO   | jvm 1    | 2011/08/11 18:10:52 |       at com.continuent.tungsten.replicator.applier.MySQLDrizzleApplier.applyStatementData(MySQLDrizzleApplier.java:125)
INFO   | jvm 1    | 2011/08/11 18:10:52 |       ... 5 more
INFO   | jvm 1    | 2011/08/11 18:10:52 | Caused by: org.drizzle.jdbc.internal.common.QueryException: Table 't1' already exists
INFO   | jvm 1    | 2011/08/11 18:10:52 |       at org.drizzle.jdbc.internal.mysql.MySQLProtocol.executeQuery(MySQLProtocol.java:500)
INFO   | jvm 1    | 2011/08/11 18:10:52 |       at org.drizzle.jdbc.internal.mysql.MySQLProtocol.executeBatch(MySQLProtocol.java:546)
INFO   | jvm 1    | 2011/08/11 18:10:52 |       at org.drizzle.jdbc.DrizzleStatement.executeBatch(DrizzleStatement.java:917)
INFO   | jvm 1    | 2011/08/11 18:10:52 |       ... 6 more
INFO   | jvm 1    | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,218 [tsandbox - Event dispatcher thread] WARN  management.OpenReplicatorManager Performing emergency service shutdown
INFO   | jvm 1    | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,219 [tsandbox - Event dispatcher thread] INFO  pipeline.Pipeline Shutting down pipeline: slave
INFO   | jvm 1    | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,219 [tsandbox - q-to-dbms-0] INFO  pipeline.SingleThreadStageTask Terminating processing for stage task thread
INFO   | jvm 1    | 2011/08/11 18:10:52 | 2011-08-11 18:10:52,219 [tsandbox - q-to-dbms-0] INFO  pipeline.SingleThreadStageTask Last successfully processed event prior to termination: seqno=0 eventid=mysql-bin.000002:0000000000000426;20
Did you see the reason for the error? No? Neither did I. I would need to open the THL, look for event #1, and determine what it was. Instead, the new user.log looks like this:
2011-08-11 18:10:52,216 ERROR Received error notification: Event application failed: seqno=1 fragno=0 message=java.sql.SQLException: Statement failed on slave but succeeded on master
Caused by : java.sql.SQLException: Statement failed on slave but succeeded on master
Caused by : Statement failed on slave but succeeded on master
Caused by : Table 't1' already exists
Caused by : Table 't1' already exists
2011-08-11 18:10:54,721 INFO  State changed ONLINE -> OFFLINE:ERROR
2011-08-11 18:10:54,721 WARN  Received irrelevant event for current state: state=OFFLINE:ERROR event=OfflineNotification
That's much better. It is not perfect yet, but it will be soon. Right now, it tells me what is wrong without forcing me to go hunting for it amid hundreds of stack trace lines. Give it a try, using the latest replicator build.

Sunday, October 05, 2008

30 terabyte per night


LSST

The Large Synoptic Survey Telescope is making news once more. Its projected data inflow of 30 Terabyte per night has caught to the imagination of slashdot readers. Why is this interesting? Because it was not news to me.

You may recall that I was enthusiast about a Astronomy, Petabytes, and MySQL at the MySQL Users Conference 2008, and with reason. The engineers at Stanford have a plan of storing petabytes of data into a cluster of MySQL databases.
The technical session was complemented by an exciting keynote on The Science and Fiction of Petascale Analytics, which closed the conference with a bang.
Hey! that's innovation! Anyone has a massive data storage to tell at the next Users Conference?