Data seem sometimes to have their own life and will, and they refuse to behave as we wish.
Then, you need a firm hand to tame the wild data and turn them into quiet and obeying pets.
MySQL-Sandbox 3.1.11 introduces a new utility, different from anything I have put before in the MySQL Sandbox toolkit.
make_sandbox_from_url downloads a tiny MySQL tarball from a repository and install it straight away.
As of today, the following packages are available
Major release
versions
package size
(what you download)
expanded size
(storage used)
original size
(not included)
5.0
5.0.96
20M
44M
371M
5.1
5.1.72
23M
59M
485M
5.5
5.5.50
15M
49M
690M
5.6
5.6.31
18M
61M
1.1G
5.7
5.7.13
33M
108M
2.5G
The sizes of the tarballs mentioned in the table above are much smaller than the original packages. The binaries have been stripped of debug info, compressed whenever possible, and purged of all binaries that are not needed for sandbox operations. This means that:
You can download the needed tarball very fast;
The storage needed for the binaries is reduced immensely.
Here is an example of the script in action. We download and install mySQL 5.0.96 in one go:
$ make_sandbox_from_url 5.0 -- --no_show
wget -O 5.0.96.tar.gz
'http://github.com/datacharmer/mysql-docker-minimal/blob/master/dbdata/5.0.96.tar.gz?raw=true'
URL transformed to HTTPS due to an HSTS policy
--2016-07-10 17:59:33--
https://github.com/datacharmer/mysql-docker-minimal/blob/master/dbdata/5.0.96.tar.gz?raw=true
Resolving github.com (github.com)... 192.30.253.112
Connecting to github.com (github.com)|192.30.253.112|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location:
https://github.com/datacharmer/mysql-docker-minimal/raw/master/dbdata/5.0.96.tar.gz
[following]
--2016-07-10 17:59:33--
https://github.com/datacharmer/mysql-docker-minimal/raw/master/dbdata/5.0.96.tar.gz
Reusing existing connection to github.com:443.
HTTP request sent, awaiting response... 302 Found
Location:
https://raw.githubusercontent.com/datacharmer/mysql-docker-minimal/master/dbdata/5.0.96.tar.gz
[following]
--2016-07-10 17:59:34--
https://raw.githubusercontent.com/datacharmer/mysql-docker-minimal/master/dbdata/5.0.96.tar.gz
Resolving raw.githubusercontent.com (raw.githubusercontent.com)...
151.101.12.133
Connecting to raw.githubusercontent.com
(raw.githubusercontent.com)|151.101.12.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 20052235 (19M) [application/octet-stream]
Saving to: ‘5.0.96.tar.gz’
5.0.96.tar.gz
100%[=================================================================================>]
19.12M 15.2MB/s in 1.3s
2016-07-10 17:59:37 (15.2 MB/s) - ‘5.0.96.tar.gz’ saved [20052235/20052235]
The MySQL Sandbox, version 3.1.11
(C) 2006-2016 Giuseppe Maxia
# Starting server
. sandbox server started
# Loading grants
Your sandbox server was installed in $HOME/sandboxes/msb_5_0_96
If you call the same command twice, you will get a message saying that you can now use make_sandbox x.x.xx to install your sandbox.
The script is doing what I should probably have done from the beginning by default: expands the tarball in $SANDBOX_BINARY (by default $HOME/opt/mysql) from where it is easy to reuse with minimum typing.
As of today, the binaries are Linux ONLY. I made this repository to use it with Docker (I will write about it soon) and that means using Linux. This is still part of an experiment that so far is working well. The project can either evolve in smarter directions or merge with clever containers. It's early to say. For now, enjoy the fastest set-up that MySQL Sandbox can offer!
It had been in the making for long time. Google announced that Google Code would be closing, and since then the Continuent team has been hard at work to handle the transition. You can guess it: this operation would have been quicker if it had been done by a small company like we were one year ago, but being part of a large corporation introduces some constraints that have affected our schedule.
However, our wish has always been, and still is, to keep Tungsten Replicator as an open source product, with full functionalities and with the full benefits that the open source development model offers.
Today, Tungsten Replicator is available on GitHub as vmware/tungsten-replicator, and it is wearing new clothes. It is not GPL anymore. In an effort to facilitate contributions, its license was changed to Apache 2.0.
Feature-wise, there is little difference from the previous release of 4.0. Mainly, we have cleaned up the code and moved out the pieces that no longer fit:
Bristlecone was removed from the package. It is used only for testing, and it will be released separately. There is no need to duplicate it into every Tungsten tarball.
The cookbook recipes have been retired. These scripts were created when the installer was still in its infancy and we had little documentation. Therefore, it was convenient to have wrappers for the common installation operations. Using the manual, it is pretty easy to install master/slave, fan-in, and multi-master topologies. The biggest reason for removing the cookbook, though, is that it was only useful for MySQL replication. If you need heterogenous deployments, the cookbook was an obstacle, rather than being helpful.
Some files were shuffled within the deployment tree. The ./tungsten-replicator/scripts directory was merged with ./tungsten-replicator/bin, the applier templates were moved from samples to a dedicated path, and we also did some other similar cleanup.
Although it has changed location and license, this is not a "release." If you compile the code, it will come up as 4.1, but it is still work in progress. Same as what was happening in the previous repository, we tag the code with the next version, and start working on it until it is deemed ready for release. The latest release for production (4.0.1) is still available from the old directory.
The code is available on GitHub, which makes collaboration much simpler than the previous repository. Take advantage of it: fork it, and help make the best replication tool even better!
It has been a few busy months until now. I have moved from Italy to Thailand, and the move has been my first priority, keeping me from attending FOSDEM and interacting with social media.
Now I start catching my breath, and looking around for new events to attend. But before I get into this, let’s make a few things clear:
I am still working for Continuent. Actually, it’s because of my company flexibility that I could move to a different country (a different continent, 6 time zones away) without much trouble. Thanks, Continuent! (BTW: Continuent is hiring! )
I am still involved with MySQL activities, events, and community matters. I just happen to be in a different time zone, where direct talk with people in Europe and US need to happen on a different schedule.
But in the meantime, Colin encouraged me to submit talk proposals at FOSSAsia, and both my submissions were accepted.
So, at the end of February I will be talking about some of my favorite topics:
Easy MySQL multi master replication with Tungsten
Data in the cloud: mastering the ephemeral
The exact schedule will be announced shortly. I am eager to attend an open source event in Asia. It’s been a long time since I went to a similar event in Malaysia, which was much pleasant.
Tungsten Replicator is a powerful replication engine that, in addition to providing the same features as MySQL Replication, can also create several topologies, such as
all-masters: every master in the deployment is a master, and all nodes are connected point-to-point, so that there is no single point of failure (SPOF).
fan-in: Several masters can replicate into a single slave;
star: It’s an all-masters topology, where one node acts as hub which simplifies the deployment at the price of creating a SPOF.
The real weakness of these topologies is that they don’t come together easily. Installation requires several commands, and running them unassisted is a daunting task. Some time ago, we introduced a set of scripts (the Tungsten Cookbook) that allow you to install multi-master topologies with a single command. Of course, the single command is just a shell script that creates and runs all the commands needed for the deployment. The real downer is the installation time. For an all-masters topology with 4 nodes, you need 17 operations, which require a total of about 8 minutes. Until today, we have complex operations, and quite slow.
Meet The TPM
Notice: these examples require a recent night build of Tungsten Replicator (e.g. 2.1.1-120), which you can download from http://bit.ly/tr_21_builds
But technology advances. The current tungsten-installer, the tool that installs Tungsten-Replicator instances, has evolved into a tool that has been used for long time to install our flagship product, Continuent Tungsten (formerly known as ‘Tungsten Enterprise’). The ‘tpm’ (Tungsten Package Manager) has outsmarted its name, as it does way more than managing packages, and actually provides a first class installation experience. Among other things, it provides hundreds of validation checks, to make sure that the operating system, the network, and the database servers are fit for the installation. Not only that, but it installs all components, in all servers in parallel.
So users of our commercial solution have been enjoying this more advanced installation method for quite a long time, and the tpm itself has improved its features, becoming able to install single Tungsten Replicator instances, in addition to the more complex HA clusters. Looking at the tool a few weeks ago, we realized that tpm is so advanced that it could easily support Tungsten Replicator topologies with minimal additions. And eventually, we have it!
The latest nightly builds of Tungsten Replicator include the ability of installing multi-master topologies using tpm. Now, not only you can perform these installation tasks using the cookbook recipes, but the commands are so easy that you can actually run them without help from shell scripts.
Let’s start with the plain master/slave installation (Listing 1). The command looks similar to the one using tungsten-installer. The syntax has been simplified a bit. We say members instead of cluster-hosts, master instead of master-host, replication-user and replication-password instead of datasource-user and datasource-password. And looking at this command, it does not seem worth the effort to use a new syntax just to save a few keystrokes.
However, the real bargain starts appearing when we compare the installation time. Even for this fairly simple installation, which ran in less than 2 minutes with tungsten-installer, we get a significant gain. The installation now runs in about 30 seconds.
Image 1 - Master/slave deployment
Where we see the most important advantages, though, is when we want to run multiple masters deployments. The all-masters installation command, lasting 8 minutes, which I mentioned a few paragraphs above? Using tpm, now runs in 45 seconds, and it is one command only. Let’s have a look
It’s worth observing this new compact command line by line:
./tools/tpm install four_musketeers: This command calls tpm with the ‘install’ mode, to the entity ‘four_musketeers’. This thing is a data service, which users of other Tungsten products and readers of Robert Hodges blog will recognize as a more precise definition of what we commonly refer to as ‘a cluster.’ Anyway, this data service appears in the installation and, so far, does not have much to say within the replicator usage. So just acknowledge that you can name this entity as you wish, and it does not affect much of the following tasks.
–topology=all-masters: Some of the inner working of the installer depend on this directive, which tells the tpm what kind of topology to expect. If you remember what we needed to do with tungsten-installer + configure-service, you will have some ideas of what this directive tells tpm to do and what you are spared now.
–home-directory=/opt/continuent/replicator: Nothing fancy here. This is the place where we want to install Tungsten.
–replication-user=tungsten: It’s the database user that will take care of the replication.
–replication-password=secret: The password for the above user;
–masters=host1,host2,host3,host4: This is the list of nodes where a master is deployed. In the case of an all-masters topology, there is no need of listing the slaves: by definition, every host will have a slave service for the remaining masters.
–master-services=alpha,bravo,charlie,delta: This is the list of service names that we will use for our topology. We can use any names we want, including the host names or the names of your favorite superheroes.
–start: with this, the replicator will start running immediately after the deployment.
This command produces, in 45 seconds, the same deployment that you get with tungsten-installer in about 8 minutes.
Image 2 - all-masters deployment
The command is so simple that you could use it without assistance. However, if you like the idea of Tungsten Cookbook assembling your commands and running them, giving you access to several commodity utilities in the process, you can do it right now. Besides, if you need to customize your installation with ports, custom paths and management tools, you will appreciate the help provided by Tungsten Cookbook.
Listing 3: invoking tpm installation for all-masters using a cookbook recipe.
When you define USE_TPM, the installation recipe will use tpm instead of tungsten-installer. Regardless of the verbosity that you have chosen, you realize that you are using the tpm because the installation is over very soon.
The above command (either the one done manually or the built-in recipe) will produce a data service with four nodes, all of which are masters, and you can visualize them as:
Listing 4: The cluster overview after an all-masters installation.
More topologies: fan-in
Here is the command that installs three masters in host1,host2, and host3, all fanning in to host4, which will only have 3 slave services, and no master.
You will notice that it’s quite similar to the installation of all-masters. The most notable difference is that, in addition to the list of msters, the list of masters, there is also a list of slaves.
--masters=host1,host2,host3 \
--slaves=host4 \
Listing 6: How a fan-in topology is defined.
We have three masters, and one slave listed. We could modify the installation command this way, and we would have two fan-in slaves getting data from two masters.
--masters=host1,host2 \
--slaves=host3,host4 \
#
# The same as:
#
--masters=host1,host2 \
--members=host1,host2,host3,host4 \
Listing 7: Reducing the number of masters increases the slaves in a fan-in topology.
Now we will have two masters in host1 and host2, and two fan-in slaves in host3 and host4.
Image 4 - Fan-in deployment with two slaves
If we remove another master from the list, we will end up with a simple master/slave topology.
And a star
The most difficult topology is the star, where all nodes are masters and a node acts as a hub between each endpoint and the others.
Now the only complication about this topology is that it requires two more parameters than all-masters or fan-in. We need to define which node is the hub, and how to name the hub service. But this topology has the same features of the one that you could get by running 11 commands with tungsten-installer + configure-service.
More TPM: building complex clusters
The one-command installation is just one of tpm many features. Its real power resides in its ability of composing more complex topologies. The ones shown above are complex, and since they are common there are one-command recipes that simplify their deployment. But there are cases when we want to deploy beyond these well known topologies, and compose our own cluster. For example, we want an all-masters topology with two additional simple slaves attached to two of the masters. To compose a custom topology, we can use tpm in stages. We configure the options that are common to the whole deployment, and then we shape up each component of the cluster.
In Listing 9, we have 5 tpm commands, all of which constitute a composite deployment order. In segment #1, we tell tpm the options that apply to all the next commands, so we won’t have to repeat them. In segment #2, we define the same 4 masters topology that we did in Listing 2. Segments #3 and #4 will create a slave service each on hosts host5 and host6, with the respective masters being in host3 and host4. The final segment #5 tells tpm to take all the information created with the previous command, and finally run the installation. You may be wondering how the tpm will keep track of all the commands, and recognize that they belong to the same deployment. What happens after every command is that the tpm adds information to a file named deploy.cfg, containing a JSON record of the configuration we are building. Since we may have previous attempts at deploying from the same place, we add the option –reset to our first command, thus making sure that we start a new topology, rather than adding to a previous one (which indeed we do when we want to update an existing data service).
The result is what you get in the following image:
Image 6 - all-masters deployment with additional slaves
A word of caution about the above topology. The slaves in host5 and host6 will only get the changes originated in their respective masters. Therefore, host5 will only get changes that were originated in host4, while host6 will only get changes from host4. If a change comes from host1 or host2, they will be propagated to host1 to host4, because each host has a dedicated communication link to each of the other masters, but the data does not pass through to the single slaves.
Different is the case when we add slave nodes to a star topology, as in the following example.
In a star topology, the hub is a pass-through master. Everything that is applied to this node is saved to binary logs, and put back in circulation. In this extended topology, the slave service in host5 is attached to a spoke of the star. Thus, it will get only changes that were created in its master. Instead, the node in host6, which is attached to the hub master, will get all the changes coming from any node.
Extending clusters
So far, the biggest challenge when working with multi-master topologies has been extending an existing cluster. Starting with two nodes and then expanding it to three is quite a challenging task. (Figure 8)
Using tpm, though, the gask becomes quite easy. Let's revisit the all-masters installation command, similar to what we saw at the start of this article
That's all it takes. The update command is almost a repetition of the install command, with the additional components. The same command also restarts the replicators, to get the configuration online.
Image 8 - Extending an all-masters topology
More is coming
The tpm is such a complex tool that exploring it all in one session may be daunting. In addition to installing, you can update the data service, and thanks to its precise syntax, you can deploy the change exactly in the spot where you want it, without moving from the staging directory. We will look at it with more examples soon.
Second, I would say that I am quite surprised at how much we have done in this release. The previous release (2.0.7) was in February, which is just a few months ago, and yet it looks like ages when I see the list of improvements, new features and bug fixes in the Release Notes. I did not realized it until I ran my last batch of checks to test the upgrade from the previous release, which I hadn’t run for quite a long time. It’s like when you see a son growing in front of your eyes day by day, and you don’t realize he’s grown a full foot until a distant relative comes visit you. The same happened to me here. I looked at the ./cookbook directory in 2.0.7, and I saw just a handful of commands (most of them now deprecated), and then at 2.1.0, which has about 30 new commands, all nicely categorized and advertised in the embedded documentation. If you are starting today with Tungsten Replicator 2.1.0, you can run
./cookbook/readme
and
./cookbook/help
Upgrade
If you were using Tungsten Replicator before, you need to know how to upgrade. If, by any unfortunate chance, you were not using the Cookbook recipes to run the installation, the method for installing is the following:
If your node has more than one service, restart the replicator
If you are using the cookbook, you can run an upgrade using
./cookbook/upgrade
This command will ask for your current topology and then show all the commands that you should run to perform the upgrade, including adapt the cookbook scripts to use the new deployment.
So, What’s New:
The list of goodies is long. All the gory details are in the Release Notes. Here I would like to mention the ones that have impressed me more.
Oracle Extractor Is Open Source
Up to the previous release, you could extract from MySQL and appley to Oracle, all using open source tools. If you wanted to extract from Oracle, you needed a commercial license. Now all the replication layer is completely open source. You can replicate from and to Oracle using Tungsten Replicator 2.1.0 under the terms of the GPL v2. However, you will still have to buy database licenses from Oracle!
Installation and Administration
There is a long list of utilities released inside the ./cookbook directory, which will help you install and maintain the cluster with a few strokes. See References #2 and #3 below. The thing that you should try right away is:
This will tell you if your servers are ready for deployment, without actually deploying anything.
Documentation!
We have hired a stellar professional writer (my former colleague at MySQL AB, well known book writer MC Brown) and the result is that our well intentional but rather unfocused documentation is now shaping up nicely. Among all the things that got explained, Tungsten Replicator has its own getting started section.
Metadata!
Tungsten replication tools now give information using JSON. Here’s a list of commands to try:
trepctl status -json
trepctl services -json -full
trepctl properties | less
thl list -headers -high 100 [-json]
My colleague Linas Virbalas has made the team (and several customers) happy when he created two new tools:
ddlscan, a Utility to Help Analyze and Migrate Database Schemas
the rename filter A supercharged filter that can rename mostly any object in a relational database, from schema down to columns.
Linas coded also the above mentioned JSON-based improvements.
MongoDB Installation
It was improved and tested better. It’s a pleasure top see how data from a relational database flow into a rival NoSQL repository as if they belong there! See reference #4 below.
More to Come
What’s listed here is what we have tested and documented. But software development is not a linear process. There is much more boiling in the cauldron, ready to be mixed into the soup of release 2.1.1.
We’re working hard at making filters better. You will see soon the long awaited documentation for them, and a simplified interface.
Another thing that I have tested and worked surprisingly well is the creation of Change Data Capture for MySQL. This is a feature that is usually asked for by Oracle users, but I tried it for MySQL and it allowed me to create shadow tables with the audit trace of their changes. I will write about that as soon as we smooth a few rough edges.
Scripting! This going to be huge. Much of it is already available in the source, but not fully documented or integrated yet. The thing that you will see soon in the open is a series of Ruby libraries (the same used by the very sophisticated Tungsten installation tools) that is exposed for general usage by testers and tool creators. While the main focus of this library is aimed at the commercial tools, there is a significant portion of work that needs to end up in the replicator, and as a result its usability will increase.
What else? I may have forgot something important amid all the excitement. If so, I will amend in my next articles. Happy hacking!
The Percona Live MySQL Conference and Expo 2013 is almost 1 month away. It's time to start planning, set the expectations, and decide what to attend. This post will give a roundup of some of the sessions that I recommend attending and I look forward to.
First, the unexpected!
After much talk and disbelief, here they come! Oracle engineers will participate to the Percona Live conference. This is wonderful! Their participation was requested by the organizers, by the attendees, and by community advocates, who all told the Oracle management how important it is to be in this conference. Finally, they have agreed to come along, and here they are, with one keynote and three general sessions.
[Wed 3:30pm] There is an old tradition about Lightning talks, which will happen during the community reception. I am the host of the lightning talks, but among them, there is also one of mine: MySQL Replication mythology
My company's talks
Continuent is very active at many conferences, and at this one we are participating massively. I know I look partial in this matter, but I am really proud of the products that we create and maintain at my company. That's why I highly recommend these talks.
[Tue 11:30am] Getting started with Tungsten Replicator. This is alternative replication with lots of advanced features. If plain replication is unsatisfactory, that's one place to look.
[Wed 1:00pm] State of the Art for MySQL Multi-Master Replication. Yet another talk by Robert. This one is a semi-philosophical talk, explaining what you can and can't do with multiple master technologies.
[Wed 2:00pm] Surviving an Amazon Outage Neil Armitage explains the theory (a little) and practice (a lot) of high availability.
MySQL is a standard, and widely popular. Yet, it has shortcomings and weak points, which allow for alternative solutions to flourish. There are many sessions that offer alternatives to the vanilla software.
[Tue 1:20pm] MariaDB Cassandra Interoperability. MariaDB is a magnetic fork of MySQL. It's magnetic in the sense that it attract most of the features or enhancements that nobody else wanted to accept. While some of its features may look like a whim (and some of them have been discontinued already), there are some that look more interesting than others. This integration with Cassandra deserves some exploration.
[Tue 3:50pm] MySQL Cluster - When to use it and when not to. The classic MySQL Cluster. Some believe that it's a drop-in replacement for a single server. It's not. It's a powerful solution, but it is not fit for all.
[Wed 11:10am] Fine Tuning Percona XtraBackup to your workload. This tool has become a de-facto standard. It is available everywhere, easy to use, and powerful. A great tale of an alternative tool that became the standard.
On Thursday, I will travel to Boston, MA, to attend the Northeast LinuxFest, which includes also an edition of the Open Database Camp. The events will be at one of my favorite places on earth: The Massachusetts Institute of Technology, a.k.a. the MIT. Every time I speak at an event there, I feel at home, and I look forward to be there once more.
The Open Database Camp is organized, as usual, with the formula of an un-conference, where the schedule is finalized on the spot.
There are a few ideas for sessions. I have proposed two of the topics I am most familiar with:
In addition so seeing the MIT again, I will be also pleased to meet colleagues and friends from all over the place. If you happen to be nearby, let's get together!
After a long pause in the speaking game, I am back.
It's since April that I haven't been on stage, and it is now time to resume my public duties.
I will speak at MySQL Connect in San Francisco, just at the start of Oracle Open World, with a talk on MySQL High Availability: Power and Usability. It is about the cool technology that is keeping me busy here at Continuent, which can make life really easy for DBAs. This talk will be a demo fest. If you are attending MySQL Connect, you should see it!
A happy return for me. On October 27th I will talk about open source databases and the pleasures of command line operations at Linux Day in Cagliari, my hometown. Since I speak more in California than in my own backyard, I am happy that this year I managed to get a spot here.
The company will have a team meeting in Nopvember (Barcelona, here we come!) and from there I will fly to Bulgaria, where I am speaking at the Bulgarian Oracle User Group conference. Here I will have two talks, one about MySQL for business, and the other is "MySQL High Availability for the masses".
A few days later, again on the road, in London, for Percona Live, with a talk on MySQL High Availability: Power, Magic, and Usability. It is again about our core products, with some high technology fun involved. I will show how our tools can test the software, spot the mistakes, fix the cluster, and even build a step-by-step demo.
See you around. Look for me carefully, though. I may look differently from how I have been depicted so far.
Is Oracle really consciously and willingly killing MySQL?
I don't think so.
Is Oracle damaging MySQL by taking the wrong steps? Probably so.
This is my personal opinion, and AFAIK there is no official statement from Oracle on this matter, but I think I can summarize the Oracle standpoint as follows:
There is a strong and reasonable concern about security. Oracle promise to its customers is that security breeches will be treated with discretion, and no information will be released that could help potential attackers;
There is also an equally strong but unreasonable concern that exposing any bugs and code commits to the public scrutiny will help MySQL competitors;
to address the security concern, Oracle wants to hide every aspect of the bug fixing that may reveal security-related information:
bug reports that mention how the breech happens;
comments to commits that explain what has been done to fix the issue;
test cases that show the problem being solved.
From the security standpoint, the above steps have been implemented, and they look effective. Unfortunately, they have the side effects that:
the bugs database is censored, and does not provide information to users about why they should upgrade;
the public trees under Revision Control System are mutilated. In fact, it looks like Oracle has just stopped updating them.
contributions to MySQL, which weren't easy before, are now made extremely harder;
trust in Oracle good faith as MySQL steward is declining.
The inevitable side effect is that the moves that have reduced the security risk have also partially addressed Oracle's concern about exposing its innovation to the competition, thus making MySQL de-facto less open. Was it intentional? I don't know. What I know is that these actions, which make MySQL less friendly for MySQL direct competitors, rather than damaging such competitors, are in fact getting the opposite effect, because traditional open source users will have more reasons for looking at alternatives, and these competitors will look more appealing now that Oracle has stiffened its approach to open source.
The main point with this whole incident is that Oracle values its current customers more than its potential ones. While MySQL AB was focusing its business to the customers that the open source model would attract to its services, Oracle wants first and foremost to make its current customers happy, and it doesn't consider the future ones coming from open source spread worth of its attention. In short, Oracle doesn't get the open source business model.
OTOH, Oracle is doing a good job in the innovation front. A huge effort is going into new features and improvements in MySQL 5.6, showing that Oracle believes in the business behind MySQL and wants to make it grow. This is an undeniable benefit for MySQL and its users. However, there is less openness than before, because the source comes out less often and not in a shape that is suitable for contributions, but the code is open, and there is growth in both Oracle (which is taking ideas and code from MySQL forks) and MySQL forks, which merge Oracle changes into their releases. Even though the game is not played according to open source purists rules, Oracle is still a main player.
What can we, the MySQL Community, do?
We need to reinforce the idea that the open source model still works for MySQL. The business side is the only one that Oracle gets. Unfortunately, the classical Oracle sales model does not see favorably a system where you get customers by distributing a free product and try to please non-customers, with the hope that some of them will eventually buy your services.
My point is that Oracle is unintentionally harming MySQL and its own image. If Oracle cares about MySQL, it should take action now to amend the fracture, before it becomes too deep.
I don't have a solution to this issue, but I thought that spelling out the problem would perhaps help to find one.
Working with replication, you come across many topologies, some of them sound and established, some of them less so, and some of them still in the realm of the hopeless wishes. I have been working with replication for almost 10 years now, and my wish list grew quite big during this time. In the last 12 months, though, while working at Continuent, some of the topologies that I wanted to work with have moved from the cloud of wishful thinking to the firm land of things that happen. My quest for star replication starts with the most common topology. One master, many slaves.
Fig 1. Master/Slave topology
Legend
It looks like a star, with the rays extending from the master to the slaves. This is the basis of most of the replication going on mostly everywhere nowadays, and it has few surprises. Setting aside the problems related to failing over and switching between nodes, which I will examine in another post, let's move to another star.
Fig 2. Fan-in slave, or multiple sources
The multiple source replication, also known as fan-in topology, has several masters that replicate to the same slave. For years, this has been forbidden territory for me. But Tungsten Replicator allows you to create multiple source topologies easily. This is kind of uni-directional, though. I am also interested in topologies where I have more than one master, and I can retrieve data from multiple points.
Fig 3. all-to-all three nodes
Fig 4. All-to-all four nodes
Tungsten Multi-Master Installation solves this problem. It allows me to create topologies where every node replicates to every other node. Looking at the three-node scheme, it appears a straightforward solution. When we add one node, though, we see that the amount of network traffic grows quite a lot. The double sided arrows mean that there is a replication service at each end of the line, and two open data channels. When we move from three nodes to four, we double the replication services and the channels needed to sustain the scheme. For several months, I was content with this. I thought: it is heavy, but it works, and it's way more than what you can do with native replication, especially if you consider that you can have a practical way of preventing conflicts using Shard Filters. But that was not enough. Something kept gnawing at me, and from time to time I experimented with Tungsten Replicator huge flexibility to create new topologies. But the star kept eluding me. Until … Until, guess what? a customer asked for it. The problem suddenly ceased to be a personal whim, and it became a business opportunity. Instead of looking at the issue in the idle way I often think about technology, I went at it with practical determination. What failed when I was experimenting in my free time was that either the pieces did not glue together the way I wanted, or I got an endless loop. Tungsten Replicator has a set of components that are conceptually simple. You deploy a pipeline between two points, open the tap, and data starts flowing in one direction. Even with multiple masters replication, the principle is the same. You deploy many pipes, and each one has one purpose only.
Fig 5. All-masters star topology
In the star topology, however, you need to open more taps, but not too many, as you need to avoid the data looping around. The recipe, as it turned out, is to create a set of bi-directional replication systems, where you enable the central node slave services to get changes only from a specific master, and the slave services on the peripheral nodes to accept changes from any master. It was as simple as that. There are, of course, benefits and drawbacks with a star topology, compared to a all-replicate-to-all design. In the star topology, we create a single point of failure. If the central node fails, replication stops, and the central node needs to be replaced. Instead, the all-to-all design has no weaknesses. Its abundance of connections makes sure that, if a node fails, the system continues working without any intervention. There is no need for fail-over.
Fig 6. extending an all-to-all topology
Fig 7. Extending a star topology
However, there is a huge benefit in the node management. If you need to add a new node, it costs two services and two connections, while the same operation in the all-to-all replication costs 8 services and 8 connections. With the implementation of this topology, a new challenge has arisen. While conflict prevention by sharding is still possible, this is not the kind of scenario where you want to apply it. We have another conflict prevention mechanism in mind, and this new topology is a good occasion make it happen. YMMV. I like the additional choice. There are cases where a all-replicate-to-all topology is still the best option, and there are cases where a star topology is more advisable.
I will be a speaker at Percona Live - London 2011, and I am looking forward to the event, which is packed with great content. A whopping 40 session of MySQL content, plus 3 keynotes and 14 tutorials. It's enough to keep every MySQL enthusiast busy.
Continuent speakers will be particularly busy, as between me and Robert Hodges, we will be on stage four times on Tuesday, October 25th.
This event feels good from the beginning. There are plenty of participants, many names from all over the MySQL community, covering large and small companies, experienced speakers, well known names in the MySQL engineering arena, and a wealth of topics that will make me feel sorry for not being able to attend them all. It's the usual dilemma that attendees have at this kind of conferences. Not so much at Oracle Open World 2011, where there weren't that many MySQL sessions to choose from, although it was great for networking.
Our talks
Robert will open the dances with Teaching an Old Dog New Tricks: Tungsten Enterprise Clusters for MySQL, a talk about Tungsten Enterprise, my company's commercial product, which is a professional managing tool for demanding companies.
Robert, again in the afternoon, with one of the most amazing features of our open source product, Tungsten Replicator: MySQL Parallel Replication in 5 Minutes or Less. This is a feature for large replication systems where the slave can't cope with large data streams, due to the singled-thread MySQL slave. This talk will show how easy is it to plug Tungsten Replicator to a lagging slave, start parallel replication until the lag has been zeroed, and then hand over the control to the native replication again.
Then it will be my turn, with a general presentation about Tungsten Replicator, the open source product. I like the idea of calling it MySQL Replication outside the box : multiple masters, fan-in, parallel apply. The reasoning is that MySQL replication, although wildly successful in the web economy of the last decade, it is also constrained by several limits, which Tungsten, acting outside the boundaries, sets free. This will be a quick intro to Tungsten and its new user-friendly installation, with a few demos.
Finally, a classic presentation with some new content, on MySQL Sandbox: a framework for productive laziness. The news is that MySQL Sandbox now supports Percona and MariaDB builds. Again, some demos will be shown, with old and new features mixed together.
Oracle Open World 2011 is approaching. MySQL is very well represented.
Sheeri has put together a simple table of all the MySQL sessions at OOW, which is more handy than the Oracle schedule.
I will be speaking in three sessions on Sunday, October 2nd.
Sunday, 9am MySQL: Don't Be a Rookie Forever—Be in Command (Line)
I have given this talk before, as a tutorial at the UC in 2010 and at FrOSCon one month ago. It is one of the most rewarding sessions ever. The attendee were very interested. This will be a short version of the tutorial.
Sunday, 10:15am >MySQL: Jailbreaking MySQL Replication.
This is related to my job at Continuent. It will a showcase of Tungsten Replicator, with quick examples of how to build replication clusters with multiple masters, multiple sources, chained clusters, parallel replication.
There are 47 MySQL sessions in total. You can see them in Technocation summary or get the Oracle focus on mysql pdf.
There are huge expo halls at OOW. Among them, there is also MySQL. The MySQL Community booth, manned by volunteers, is at Moscone West, Level 2 Lobby. Other MySQL booths are listed in the Technocation summary.
On the social side, Oracle ACEs will have a dinner on Sunday evening, and MySQL Oracle ACEs will have another gathering on Monday evening.
On Tuesday, October 4th, there is a MySQL Community reception. It's free. You don't need a OOW pass to attend, but registration is required.
Percona has announced Percona Live MySQL Conference and Expo 2012. Kudos for their vision and entrepreneurship. I have seen comments praising their commitment to the community and their willingness to filling a void. I have to dot a few i's and cross some t's on this matter.
That was not the only game in town.
By the end of June, there were strong clues that O'Reilly was not going to organize a conference. The question of who could fill the void started to pop up. The MySQL Council started exploring the options for a community-driven conference to replace the missing one. The general plan was along the lines of "let's see who is in, and eventually run a conference without the big organizer. If nobody steps up, the IOUG can offer a venue in Las Vegas for an independent MySQL conference". The plan required general consensus among the major players, and therefore we started asking around about availability and participation. Percona did not answer our requests. They delayed the meeting, and in the meantime we continued preparing for the plan B of a conference in Vegas. Then some of us received a message from Percona, pre-announcing a conference in Santa Clara. No offer to gather a broad participation from other entities. No sign of wanting to do a neutral event, i.e. an event not tied to a single company.
Some background
That was puzzling, because I recall vividly how Baron Schwartz and Peter Zaitzev advocated strongly in favor of an independent conference, not so long ago:
The conference is organized and owned by MySQL, not the users. It isn’t a community event. It isn’t about you and me first and foremost. It’s about a company trying to successfully build a business, and other companies paying to be sponsors and show their products in the expo hall.
Baron Schwartz, April 23, 2008.
I would like to see the conference which is focused on the product users interests rather than business interests of any particular company (or personal interests of small group of people), I would like it to be affordable so more people can attend and I’d like to see it open so everyone is invited to contribute and process is as open as possible.
Peter Zaitzev, April 23, 2008.
A call to disclosure
I understand the business motivation to organize a conference with your company name in the title, while at the same time leveraging the wide MySQL community. However, if I have to judge by the organization of previous Percona Live events, I don't see any of the benefits that were advocated three years ago. I see a business conference that is inspired to the same principles that Percona was criticizing in 2008. What is it then? If it is supposed to be a community conference, let's call it "MySQL Conference" and ask for broad participation. There are plenty of people in the community who are willing to help and make the event a success, not only for the benefit of Percona, but for the global benefit of everyone in the ecosystem, including Oracle, the IOUG, and every company with a business related to MySQL. If it is not a community conference, let's state it clearly, so that people can set their expectations accordingly.
Unintended consequences
Someone may think it's a good thing to have a MySQL conference without Oracle participation but I am sure most will agree that it is not desirable. Much as I admire Percona's technical merits, if I go to the conference I want to hear from a wide range of participants. Specifically, I would like to know what's in the pipeline, and I want to hear that from the engineers in the MySQL team, i.e. from Oracle. I doubt that Oracle would send engineers and VPs to talk to a conference that is named after a competitor, and that may be true for other entities, which I (and many others) would like to hear from.
In short
Is this the conference of Baron's and Peter's earlier dreams or is it the fulfillment of their current business strategy?
Please, let the community know.
Disclaimer
The opinions in this post are my own. My employer does not censor my writings and gives me full freedom of expression, but my opinion does not necessarily match that of my company.
I have been working with MySQL replication for quite a while. I have dealt with simple replication setups and I have experimented with complex ones. Five years ago I wrote an article about advanced MySQL replication, which was mostly a dream on what you could do with imagination and skill, but the matter from that article is still not even remotely ready for production. Yet, since that article, I have been approached by dozens of people who wanted to know how to make the multiple master dream become reality. To all of them, I had to say, "sorry, this is just a proof of concept.Come back in a few years, it may become possible". It still isn't.
Despite its latest great technological advance, MySQL native replication is is very poor of topologies. What you can do with MySQL native replication is master-to-slave (which also includes relayed slaves), master-to-master, and circular replication.
Of these, circular replication is the closest thing to multiple masters that you can get with MySQL native replication, without the addition of third party services.
Circular replication is tricky to set up, although not unreasonably so. It works. With some patience and precision, you can build a cluster of a few nodes in circular replication. With luck, you can get them to work properly, without loops and with the data flowing to all the servers. Your luck runs out the moment one of the servers fails, or replication breaks down for whatsoever reason. Then you see that circular replication is actually more complicated than what it looks on the surface, and it is also quite brittle. That doesn't mean that circular replication is not used in production. It is. I have known several people who use it successfully, although nobody is really happy about it.
In addition to its fragility, circular replication is slow. If you insert data into master A, it has to travel across three nodes before reaching master D.
Another topology that seems to be very popular is the multiple source scheme. It is the opposite of master/slave. Instead of having one master that sends data to many slaves, it is many masters that send data to one slave. Despite its popularity, this topology is yet unimplemented with MySQL native replication. The best you can do to simulate the desired outcome is to do round-robin replication with cron.
With this background, it is no surprise that I was thrilled at the idea of working for a company that has made these dreams become reality. Tungsten replicator allows users to have real multiple masters topologies, and even the much coveted multiple source topology is now within the users grasp.
Compared to MySQL replication, the drawback of using Tungsten is that you need to deal with bigger complexity. It's only natural. With so many more features, there come more pieces to take care of.
An interesting point about multiple masters is the matter of conflict resolution. Asynchronous replication convenience and robustness are countered by lack of means to deal with conflicts. This difficulty has been used many times as the reason for not implementing multiple source replication in MySQL. I have my own ideas on this issue. I am aware of the risks, but if I were allowed to do multiple master replication, I would be glad to take charge of the risks. Updating different databases, or different tables in separate masters is one way of defining a conflict-free scenario where multiple masters or multiple sources could be used safely. If only we could ...
My colleague Robert Hodges has posted some interesting aspects in his blog. The bottom line is that we focus on empowering users with advanced replication features. Conflict resolution can wait. I am sure many users would love to have the problem of how to avoid conflicts if the more demanding problem of how to replicate from many places to one cluster could be solved. The good news is that some sort of conflict detection (and possibly resolution) are possible even now, without slowing down the operations and without complicating our lives unnecessarily. For example, a simple conflict that could be avoided using Tungsten filters is the one resulting in a master that is updating tables that it was not supposed to do. In a scenario where multiple source replication works on the assumption that each master updates a given subset of the data, we can easily detect and eventually reject offending updates. It is not much, but in many practical cases it would be the difference between having robust multiple source replication and doing data load and consolidation manually.
Anyway, back to the present day with very much real multi-master replication available for everyone. To alleviate the fear of the unknown, we are organizing webinars on a regular basis, where we cover the theoretical points and give practical demos of how to use the new features.
If you are a demanding user, this upcoming webinar is for you: MySQL Multi-Master and Multi-Source Replication With Tungsten. Tomorrow, March 31st, 2011, at 10am PDT.
The Open Database Camp 2011 is shaping up nicely.
The logistics is being defined and local and international volunteers are showing up for help. (Thanks, folks!)
If you want to start booking, there is a list of hotels in the Accommodation page.
And don't forget to sign up in the Attendees list.
Local travel information will be released as soon as we finish cranking up the plan.
Open Database camp is free, but we still have expenses to get the job done.
We need both official sponsors and personal donations. No minimum amount required. You can donate painlessly online through the nonprofit organization Technocation. (Thanks!)
Please see the Sponsors page for more info.
On Sunday I will be in Stuttgart with the double purpose of attending the annual European PostrgreSQL conference and the technical meeting of my company that will be held after the normal proceedings of PGDay-EU.
For the first time in several years I am attending a conference where I am not a speaker. In my previous job I did not have much opportunity to attend PostgreSQL meetings, and I welcome this opportunity. The schedule is quite interesting, and I have made my personal picks:
15:20 - 16:10 Concurrency & PostgreSQL.
Undoubtedly a topic that will become quite useful when dealing with PG replication issues that I will be facing in my job.
Back to the conference circuit after some rest.
On December 1st I will be speaking under my new affiliation at Continuent in the MySQL track at the UKOUG conference. My topic is MySQL - Features for the enterprise and it will basically cover the main features of MySQL 5.5.
This conference is the largest Oracle related event in Europe, and it is organized by users for other users. This year for the first time the conference hosts a MySQL dedicated track.
It is a sort of epidemic. Most of the important Oracle events have now some MySQL content, or a full track. I see this as a great opportunity for both classic MySQL and Oracle users to meet each other and find common business ground for the future.
The event is organized by yours truly and Felix Schupp, and we are open to cooperation from other volunteers. Specifically, we need help to beat the drum. Even if you can't participate, we will appreciate your help in making the Call for Participation known. OpenSQLCamp2010 will use the FrOSCon's Pentabarf conference coordination system to collect talk submissions and perform the organizing and scheduling of the talks. Please create an account there, if you don't have one already. Once you have activated your account via the email address you provided, please log into the system and create a new event. Make sure to select track OpenSQLCamp for your submission! IMPORTANT! - FrOSCon uses CA certificates. If you browser does not recognize them, then you need to Import the CAcert Root Certificate before using the CfP pages. The deadline for submitting your proposal is Sunday, July 11th, 2010 (12:00pm PST).
The next OpenSQLCamp will be held in Portland, Oregon, USA. It is being organized by Eric Day, well known to the open source community for his active and productive participation to several projects (especially Drizzle and Gearman).
The event is public and free. Therefore, it needs public sponsoring. I don't know yet if I can attend, but I have already donated something to the organizers, and I am officially a sponsor. You can be one too. Simply go to the sponsors page and donate a minimum of $100 as in individual or $250 as an organization. And of course, if you plan to participate, register yourselfand eventually propose a session. OpenSQLCamp is a fun, equal level event. If you have something to say, write a proposal, and the other participants will tell you if they want to hear it or not. Either way, you will learn something.
Systems analyst and database designer with 20+ years of IT experience. Deals with data analysis and migration, performance optimization, general wizardry.