Friday, December 03, 2010

Who's afraid of MySQL forks?

mysql forks? There is much talk about MySQL forks and how they are going to replace MySQL, or take over MySQL user base, or become more powerful/profitable/popular/you-name-it than MySQL itself.
Let's clear some air on this topic. There is more about forks than meets the eye, especially if you think about a few obvious facts.
What's a fork? According to Wikipedia
a project fork happens when developers take a legal copy of source code from one software package and start independent development on it, creating a distinct piece of software.
By this definition, when someone who doesn't work at the MySQL project distributes a package that is based on MySQL code but differs from the original, it's a fork.
Why am I approaching the issue from this angle? Because, apart from Windows users, who mostly download MySQL from the official site, the majority of users get MySQL through a Linux distribution or some other project. And most of the time such packages are different from the ones built by the MySQL team. There is nothing wrong with that. The differences are sometimes minimal packaging changes done to adapt MySQL to the specific distribution, and sometimes they are a cherry-picking application of patches to an old version that needs to be maintained so that the package is unlike any other MySQL version that you may find in the wild. Even if the version is the same, depending on the distribution and the age of the server, the code beneath could be wildly different from the official versions.
Thus, it turns out that many users, possibly the majority, are using a MySQL fork, albeit a very minor one.
But when people talk about forks, they often refer to three main projects:
  • The Percona distribution. This is a collection of a few distinct patches in the server, coupled with a fork of the InnoDB plugin, named XtraDB, and an independent tool for backup (XtraBackup). This fork has a solid business background. Every patch has been developed to meet user requests, and the engineers at Percona maintain them appropriately.
  • Then we have the MariaDB fork, which is a series of changes to the MySQL core, motivated by the desire of the developers to build a rich set of feature enhancements while being backward compatible to the main distribution. The business model is thus a fast track of new features and bug fixes to customers.
  • And then there is Drizzle, which has even less business traction than MariaDB, but a very well defined goal of creating a lightweight database by re-engineering a bare bones stripped down version of MySQL that is now very distant from its origins.
What I said in the above descriptions is just the synopsis of what these three forks are. In recent mythology, it is fabled that, if MySQL ceases to exist (because it goes bankrupt, or Oracle kills it, or a major accident happens to the project, whatever) users can replace MySQL with one fork, and live happily ever after.
Not so fast. There is something that few people take into account when listening to this too often repeated tale.
What most observers miss is that the forks' original code (with the exception of Drizzle) is very marginal. The bulk of the distribution is still the code produced by the MySQL team, which is merged at every minor release, and integrated with the patches produced by Percona and MariaDB. So, while technically they are forks of MySQL, they can't live independently from the official MySQL distribution. Both Percona and MariaDB don't have the manpower to maintain the server by handling the huge amount of bugs that the MySQL team is fixing every month.
There is also a matter of skill set. Percona has talented InnoDB experts, while MariaDB has mostly core server experts (and some are among the top ones, I may add). They could complement each other, although it seems that cooperation between the two projects is not as good as it used to be. (Could be my personal impression.)
The bottom line, though, is if both projects are able to survive should the main project become unavailable. I am not suggesting that Oracle wants to make MySQL scarce. On the contrary, all the information at my disposal suggest that Oracle will keep MySQL publicly available for long time.
This state of affair seems to indicate that Drizzle is, instead, a true fork that does not depend on MySQL health. To some extent, this is true. However, the main storage engine in Drizzle is InnoDB. Therefore, at least today, Drizzle is as dependent on Oracle as Percona and MariaDB.
What would happen tomorrow, if the disaster depicted by doomsday advocates comes true and MySQL actually disappears? I don't honestly know, but I would love to have a public commitment from the major players, about what they are prepared to do in terms of maintaining that huge chunk of code that today they take from Oracle releases on a monthly basis.
This is all matter of thought for MySQL users.

About adoption of the forks today, I have seen five types of arguments in favor of a MySQL fork:
  1. I need the feature provided by Percona or MariaDB, or I need a quick bug fix that I can't get from the slow roadmap at Oracle. I trust that this handful of people are able to maintain that little code that differs from MySQL and matters to me. So I don't care if they don't have 100 developers on the task.
  2. Given Oracle's track record in other Open Source projects, I don't trust them to deliver MySQL according to FOSS principles, so let's go for true Open Source alternatives.
  3. Most MySQL developers have now left Oracle, and so the forks have more chances of being higher quality.
  4. Cool! MariaDB/Percona has a bunch of features more than MySQL. It must be better. Let's use it.
  5. I like new technology. Let's plunge into them!
Argument #1 is a solid business backed reason for adopting some software. The risk is often well calculated, especially if the evaluation can be backed by performance and functional tests.
Argument #2 is frivolous, as it mixes subjective feelings into business matters. And so is argument #4. Yet, these two types of advocacy are quite popular and spread much faster than the more reasonable approach seen at #1.
Argument #3 is debatable. MySQL developers at Oracle outnumber all forks easily. The idea that the departure of a few core developers can alter the system in such a way that the whole project crumble has been already negated by facts: MySQL 5.5 is an excellent release, with enthusiastic appreciation from power users. While I agree that top MySQL talents work at the forks, I consider the MySQL team to be still in excellent shape.
Argument #5 is reasonable, if it is followed by cool judgment and backed by facts. I am one who is always ready to try new solutions, and love experimenting with cool technology. But adoption is different from proof of concept. I am happy to see that Drizzle can replace MySQL in some applications, but would I trust it in its present beta stage? Certainly not. So, I am happy to test, but I trust my valuable data to more stable solutions.

What's for you, the final user? My personal advice is: don't adopt blindly because of some enthusiastic advertising. But test the product thoroughly, and if it fits your needs, by all means, go for it. But if you don't have a specific reason, I recommend staying with the official branch, because, despite the change in affiliation, there is still a well experienced team behind it.

14 comments:

Stewart Smith said...

In Drizzle, we also have PBXT in the tree as a transactional engine... and are always open to merging others.

As for dependence on Oracle for InnoDB... while we're certainly merging improvements from upstream, we're not completely helpless when it comes to fixing and improving InnoDB.

hingo said...

Excellent status report. I've been thinking of a similar spirited post myself - I'll do it over the weekend so we can compare.

Morgan Tocker said...

I agree with almost everything you said, and I especially like the way you described Linux vendors as forks as well. In the case of Percona Server, my last check showed that 20K lines are changed out of 2-3 million. Not many.

Regarding your "five types of arguments in favor of a MySQL fork", I am not sure if this is granular enough. The 'features' really fall in two categories:

1) Performance
2) Diagnostics/Usability enhancements.

Many people get excited about (1), but the real reason they should be switching is (2). Most people are already not getting the performance they are entitled to, and the reason is that a default MySQL release is not instrumented enough.

I'm talking about things like "Waiting on query cache mutex" in the processlist, or being able to see statistics on which indexes are used/unused.

( Shameless plug: I'm holding a webinar on this next week - http://www.percona.com/webinars/2010-12-08-introduction-to-percona-server-xtradb-xtrabackup/ )

Unknown said...

What most observers also miss is that while the code may be freely available under GPL, the documentation is not, and could go away tomorrow.

Philip said...

I agree with what you say as well - though in my post I said there was only one actual fork (drizzle). But I buy your "pluto" argument too.

As an Oracle guy (and classic dolphin), I also take this as a tacit acceptance of Oracle's stewardship of MySQL. Oracle's been a good home to InnoDB for years, and MySQL Engineering is still awesome and growing.

Anonymous said...

So we have a couple of species of dolphin and a tuna.

As the MySQL Fan Boy. I'm wounding why we are still talking about this. I'm not worried where MySQL is going. I use the office (Oracle) version at work, MariaDB at home and I'd like to consult with Percona but can't afford it.

Code progress is being made.

I'm not so sure one team/party/distrubution is the right way. Fresh ideas and people willing to code them is the Open Source way.

I also like corporations distributing and supporting the code. I just wish they would keep out of the coding. A perfect world might be where Oracle takes code from and gives money too the other projects. Not to take them over or even make them dependent. Just enough to say thanks.

Wouldn't it be nice if Oracle through a party every two years and flew in, at their expense, all the people who donated the top 50% of code to any MySQL distribution.

Stewart Smith said...

As for outside code contributions, it's pretty clear that Drizzle is the widest collaboration of individuals and companies. i.e. we're a more spread out development organisation - which means that we're going to be rather resilient.

Kedar said...

"The Data Charmer" Thanks for charming the topic; really a good read.

@Morgan: I completely agree with the "(2)Diagnostics/Usability enhancements." argument.
[ Shameless "plugged": I'm attending :)]

Baron said...

Those who wish that third parties would "keep out of the coding" would do well to ponder how fast progress on InnoDB was moving before third-parties got frustrated with the snail's pace, and started releasing improvements that made the official InnoDB look less attractive.

Progress does not happen in a vacuum.

Brian Aker said...

Hi!

While Drizzle does use Innodb, we are pretty much at the end of what we will be taking from Oracle. Between HailDB and the fact that we aren't finding much progress being done with Innodb, we will only be making use a few more of their patches before breaking away from it entirely.

I am not really sure what you mean by a business case. We do have companies support that support Drizzle, and while we are certainly not as MySQL how downloads and usage continue to grow (despite us being in Beta).

Drizzle is not the one shop model that exists for MySQL/others. We much refer the Linux route, and encourage multiple vendors. I believe you would agree that long ago the "we need a single business entity" model was proven to be false.

Cheers,
-Brian

Mark Callaghan said...

I have a hard time keeping up with the many good changes that are new in InnoDB. The features from 5.5 include multiple buffer pool instances to reduce mutex contention on the buffer pool mutex, multiple rollback segments to reduce contention on the rollback segment mutex and allow for more than 1023 concurrent transactions, an option to move purge to a separate thread to let it keep up with IO intensive loads, changes to reduce contention on the transaction log (log sys) mutex and use of a separate mutex for the flush list to reduce contention on the buffer pool mutex.

Stewart Smith said...

All the changes up to 5.5.6 are in Drizzle right now. A merge request is up there now for 5.5.7, so Drizzle will be completely up to date with the MySQL 5.5 improvements.

It will be interesting to see the next batch of improvements to InnoDB too.

Stewart Smith said...

It's interesting to note that the last change to the 5.5 repository on launchpad was four weeks ago.

Anonymous said...

A comparative of the different MySQL forks:
http://investigacionit.com.ar/2012/02/forks-de-mysql/