Thursday, March 31, 2011

MySQL replication for demanding users

I have been working with MySQL replication for quite a while. I have dealt with simple replication setups and I have experimented with complex ones. Five years ago I wrote an article about advanced MySQL replication, which was mostly a dream on what you could do with imagination and skill, but the matter from that article is still not even remotely ready for production. Yet, since that article, I have been approached by dozens of people who wanted to know how to make the multiple master dream become reality. To all of them, I had to say, "sorry, this is just a proof of concept.Come back in a few years, it may become possible". It still isn't.
Despite its latest great technological advance, MySQL native replication is is very poor of topologies. What you can do with MySQL native replication is master-to-slave (which also includes relayed slaves), master-to-master, and circular replication.
replication topologies
Of these, circular replication is the closest thing to multiple masters that you can get with MySQL native replication, without the addition of third party services.
Circular replication is tricky to set up, although not unreasonably so. It works. With some patience and precision, you can build a cluster of a few nodes in circular replication. With luck, you can get them to work properly, without loops and with the data flowing to all the servers. Your luck runs out the moment one of the servers fails, or replication breaks down for whatsoever reason. Then you see that circular replication is actually more complicated than what it looks on the surface, and it is also quite brittle. That doesn't mean that circular replication is not used in production. It is. I have known several people who use it successfully, although nobody is really happy about it.
In addition to its fragility, circular replication is slow. If you insert data into master A, it has to travel across three nodes before reaching master D.
Another topology that seems to be very popular is the multiple source scheme. It is the opposite of master/slave. Instead of having one master that sends data to many slaves, it is many masters that send data to one slave. Despite its popularity, this topology is yet unimplemented with MySQL native replication. The best you can do to simulate the desired outcome is to do round-robin replication with cron.
With this background, it is no surprise that I was thrilled at the idea of working for a company that has made these dreams become reality. Tungsten replicator allows users to have real multiple masters topologies, and even the much coveted multiple source topology is now within the users grasp.
Compared to MySQL replication, the drawback of using Tungsten is that you need to deal with bigger complexity. It's only natural. With so many more features, there come more pieces to take care of.
An interesting point about multiple masters is the matter of conflict resolution. Asynchronous replication convenience and robustness are countered by lack of means to deal with conflicts. This difficulty has been used many times as the reason for not implementing multiple source replication in MySQL. I have my own ideas on this issue. I am aware of the risks, but if I were allowed to do multiple master replication, I would be glad to take charge of the risks. Updating different databases, or different tables in separate masters is one way of defining a conflict-free scenario where multiple masters or multiple sources could be used safely. If only we could ...
My colleague Robert Hodges has posted some interesting aspects in his blog. The bottom line is that we focus on empowering users with advanced replication features. Conflict resolution can wait. I am sure many users would love to have the problem of how to avoid conflicts if the more demanding problem of how to replicate from many places to one cluster could be solved. The good news is that some sort of conflict detection (and possibly resolution) are possible even now, without slowing down the operations and without complicating our lives unnecessarily. For example, a simple conflict that could be avoided using Tungsten filters is the one resulting in a master that is updating tables that it was not supposed to do. In a scenario where multiple source replication works on the assumption that each master updates a given subset of the data, we can easily detect and eventually reject offending updates. It is not much, but in many practical cases it would be the difference between having robust multiple source replication and doing data load and consolidation manually.

Anyway, back to the present day with very much real multi-master replication available for everyone. To alleviate the fear of the unknown, we are organizing webinars on a regular basis, where we cover the theoretical points and give practical demos of how to use the new features.
If you are a demanding user, this upcoming webinar is for you: MySQL Multi-Master and Multi-Source Replication With Tungsten. Tomorrow, March 31st, 2011, at 10am PDT.

3 comments:

Slacker said...

Um... you posted this article almost 3 hours after that webinar started. Is a recording going to be available online?

datacharmer said...

The article was posted one day before. It was aggregated today, because Planet MySQL was not working properly.
The webinar recording will be available here

Eero Teerikorpi said...

Multi-master replication webinar recording can be found at https://www1.gotomeeting.com/register/973343952

Vote on Planet MySQL