Tuesday, January 07, 2014

Multiple masters : attraction to the stars

In the last 10 years I have worked a lot with replication systems, and I have developed a keen interest in the topic of multiple masters in a single cluster. My interest has a two distinct origins:

  • On one hand, I have interacted countless times with users who want to use a replication system as a drop-in replacement for a single server. In many cases, especially when users are dealing with applications that are not much flexible or modular, this means that the replication system must have several points of data entry, and such points must work independently and in symbiosis with the rest of the nodes.
  • On the other hand, I am a technology lover (look it up in the dictionary: it is spelled geek), and as such I get my curiosity stirred whenever I discover a new possibility of implementing multi-master systems.

The double nature of this professional curiosity makes me sometimes forget that the ultimate goal of technology is to improve the users life. I may fall in love with a cute design or a clever implementation of an idea, but that cleverness must eventually meet with usability, or else it loses its appeal. There are areas where the distinction between usefulness and cleverness is clear cut. And there are others where we really don’t know where we stand because there are so many variables involved.

One of such cases is a star topology, where you have many master nodes, which are connected to each other through a hub. You can consider it a bi-directional master/slave. If you take a master/slave topology, and make every node able to replicate back to the master, then you have almost a star. To make it complete, you also need to add the ability of the master of broadcasting the changes received from the outside nodes, so that every node gets the changes from every other node. Compared to other popular topologies, say point-to-point all-masters, and circular replication, the star topology has the distinct advantage of requiring less connections, and of making it very easy to add a new node.

Star

Figure #1: Star topology

However, anyone can see immediately one disadvantage of the star topology: the hub is the cornerstone of the cluster. It’s a single point of failure (SPOF). If the hub fails, there is no replication anywhere. Period. Therefore, when you are considering a multi-master topology, you have to weigh in the advantages and disadvantages of the star, and usually you consider the SPOF as the most important element to consider.

Depending on which technology you choose, though, there is also another important element to consider, i.e. that data must be replicated twice when you use a star topology. It’s mostly the same thing that happens in a circular replication. If you have nodes A, B, C, and D, and you write data in A, the data is replicated three times before it reaches D (A->B, B->C, and C->D). A star topology is similar. In a system where A, B, and D are terminal nodes, and C is the hub, data needs to travel twice before it reaches D (A->C, C->D). Circular replication

Figure #2: Circular replication

This double transfer is bad for two reasons: it affects performance, and it opens to the risk of unexpected transformations of data. Let’s explore this concept a bit. When we replicate data from a master to a slave, there is little risk of mischief. The data goes from the source to a reproducer. If we use row-based-replication, there is little risk of getting the wrong data in the slave. If we make the slave replicate to a further slave, we need to apply the data, generate a further binary log in the slave host, and replicate data from that second binary log. We can deal with that, but at the price of taking into account more details, like where the data came from, when to stop replicating in a loop, whether the data was created with a given configuration set, and so on. In short, if your slave server has been configured differently from the master, chances are that the data down the line may be different. In a star topology, this translates into the possibility of data in each spoke to be replicated correctly in the hub, but to be possibly different in the other spokes.

Compare this with a point-to-point all-masters. In this topology, there are no SPOFs. You pay for this privilege by having to set a higher number of connections between nodes (every node must connect to every other node), but there is no second hand replication. Before being applied to the slave service, the data is applied only once in the originating master.

Point to point all masters

Figure #2: Point-to-point all-masters topology

Where do I want to go from all the above points? I have reached the conclusion that, much as user like star topologies, because of their simplicity, I find myself often recommending the more complex but more solid point-t-point all-masters setup. Admittedly, the risk of data corruption is minimal. The real spoiler in most scenarios is performance. When users realize that the same load will flow effortlessly in a point-to-point scenario, but cause slave lags in a star topology, then the choice is easy to make. If you use row-based replication, and in a complex topology it is often a necessary requirement, the lag grows to a point where it becomes unbearable.

As I said in the beginning, all depends on the use case: if the data load is not too big, a star topology will run just as fine as point-to-point, and if the data flow is well designed, the risk of bad data transformation becomes negligible. Yet, the full extent of star topologies weaknesses must be taken into account when designing a new system. Sometimes, investing some effort into deploying a point-to-point all-masters topology pays off in the medium to long term. Of course, you can prove that only if you deploy a star and try it out with the same load. If you deploy it on a staging environment, no harm is done. If you deploy in production, then you may regret. In the end, it all boils down to my mantra: don’t trust the theory, but test, test, test.

No comments: