Thursday, March 31, 2011

MySQL replication for demanding users

I have been working with MySQL replication for quite a while. I have dealt with simple replication setups and I have experimented with complex ones. Five years ago I wrote an article about advanced MySQL replication, which was mostly a dream on what you could do with imagination and skill, but the matter from that article is still not even remotely ready for production. Yet, since that article, I have been approached by dozens of people who wanted to know how to make the multiple master dream become reality. To all of them, I had to say, "sorry, this is just a proof of concept.Come back in a few years, it may become possible". It still isn't.
Despite its latest great technological advance, MySQL native replication is is very poor of topologies. What you can do with MySQL native replication is master-to-slave (which also includes relayed slaves), master-to-master, and circular replication.
replication topologies
Of these, circular replication is the closest thing to multiple masters that you can get with MySQL native replication, without the addition of third party services.
Circular replication is tricky to set up, although not unreasonably so. It works. With some patience and precision, you can build a cluster of a few nodes in circular replication. With luck, you can get them to work properly, without loops and with the data flowing to all the servers. Your luck runs out the moment one of the servers fails, or replication breaks down for whatsoever reason. Then you see that circular replication is actually more complicated than what it looks on the surface, and it is also quite brittle. That doesn't mean that circular replication is not used in production. It is. I have known several people who use it successfully, although nobody is really happy about it.
In addition to its fragility, circular replication is slow. If you insert data into master A, it has to travel across three nodes before reaching master D.
Another topology that seems to be very popular is the multiple source scheme. It is the opposite of master/slave. Instead of having one master that sends data to many slaves, it is many masters that send data to one slave. Despite its popularity, this topology is yet unimplemented with MySQL native replication. The best you can do to simulate the desired outcome is to do round-robin replication with cron.
With this background, it is no surprise that I was thrilled at the idea of working for a company that has made these dreams become reality. Tungsten replicator allows users to have real multiple masters topologies, and even the much coveted multiple source topology is now within the users grasp.
Compared to MySQL replication, the drawback of using Tungsten is that you need to deal with bigger complexity. It's only natural. With so many more features, there come more pieces to take care of.
An interesting point about multiple masters is the matter of conflict resolution. Asynchronous replication convenience and robustness are countered by lack of means to deal with conflicts. This difficulty has been used many times as the reason for not implementing multiple source replication in MySQL. I have my own ideas on this issue. I am aware of the risks, but if I were allowed to do multiple master replication, I would be glad to take charge of the risks. Updating different databases, or different tables in separate masters is one way of defining a conflict-free scenario where multiple masters or multiple sources could be used safely. If only we could ...
My colleague Robert Hodges has posted some interesting aspects in his blog. The bottom line is that we focus on empowering users with advanced replication features. Conflict resolution can wait. I am sure many users would love to have the problem of how to avoid conflicts if the more demanding problem of how to replicate from many places to one cluster could be solved. The good news is that some sort of conflict detection (and possibly resolution) are possible even now, without slowing down the operations and without complicating our lives unnecessarily. For example, a simple conflict that could be avoided using Tungsten filters is the one resulting in a master that is updating tables that it was not supposed to do. In a scenario where multiple source replication works on the assumption that each master updates a given subset of the data, we can easily detect and eventually reject offending updates. It is not much, but in many practical cases it would be the difference between having robust multiple source replication and doing data load and consolidation manually.

Anyway, back to the present day with very much real multi-master replication available for everyone. To alleviate the fear of the unknown, we are organizing webinars on a regular basis, where we cover the theoretical points and give practical demos of how to use the new features.
If you are a demanding user, this upcoming webinar is for you: MySQL Multi-Master and Multi-Source Replication With Tungsten. Tomorrow, March 31st, 2011, at 10am PDT.

Thursday, March 17, 2011

How fast is parallel replication? See it live today

I talked about parallel replication last month. Since then, there has been a considerable interest for this feature. As far as I know, Tungsten's is the only implementation of this much coveted feature, so I can only compare with MySQL native replication.
The most compelling question is "how fast is it?"
That's a tricky one. The answer is the same that I give when someone asks me "how fast is MySQL". I always say: it depends.
Running replication in a single thread is sometimes slower than the operations in the master. Many users complain that the single thread can't keep up with the master, and the slave lags behind. True. There is, however, a hidden benefit of single threaded replication: it requires less resources. There is no contention for writing on disk, no need to worry about several users blocking a table. You need to contend with the users that want to read the tables, but the lone writer has an easy job, albeit a hard one.
When we introduce parallel replication, the easy job fades away, and we are faced with the problem: how do I allow several writers to do the work of one? It's a nice problem to have. MySQL native replication does not allow parallel apply, but with Tungsten you can start tackling the issue of allowing several parallel threads to update the system at once. Therefore, this is the same problem that you have on a server where several users are allowed to write at once. If the server has sufficient resources, the operations will be fast. If it doesn't, the operations will lag behind.
Another aspect of the question is "what kind of queries?" If your database is well established and set in stone, and you mostly UPDATEs, the replication performance will depend on how well your server is tuned for concurrent writes. If you run ALTER TABLE statements on a daily basis, your queries will queue up after that ALTER TABLE no matter what. And if you have only INSERT and DELETE queries, parallel replication will probably depend on how fast is your server.
Ultimately, I can tell you that I have seen or experienced directly a wide range of repeatable results. I know cases where parallel replication is three times as fast as native replication. These cases usually involve huge amounts of binary logs, like in the case when your slave needs to be taken off-line for a few hours or even days and then it tries to catch up. Other cases that can be reproduced with a minimal amount of sample data show parallel replication as being 30% to 50% faster. ANd then there are cases when your server is so poor on resources or the load is so unevenly distributed that parallel replication is as fast as native replication. I would say that these cases are easily cured by beefing up the server.
If you want to see a demo of how this replication works, you can join this webinar:
Zoom, Zoom, Zoom! MySQL Parallel Replication With Tungsten Replicator 2.0.
You can tell from the title that we are quite excited about the product that we are building.

Tuesday, March 15, 2011

Quick benchmarking trick

I have been doing quite a lot of benchmarking recently.
I needed to find a safe way of measuring the time spend by the database doing a long task, like catching up on a huge backlog of accumulated replication updates. The problem with measuring this event is that I can record when it starts, but I can't easily detect when it finishes. My initial approach was to monitor the database and count the tables rows to see when the task was done, but I ended up affecting the task performance with my additional queries. So I thought of another method.
Since I had control on what was sent from the master to the slave, I used the following:
The initial time is calculated as the minimum creation time of the databases that I know are created during the exercise. Let's say that I had 5 databases named from db1 to db5:
set @START = (select min(create_time) from information_schema.tables where table_schema like "db%")
Then, to make sure that I catch the exact moment that the task is finished, I added to the master a command for each database:
create table db1.last_table (i int);
create table db2.last_table (i int);
create table db3.last_table (i int);
create table db4.last_table (i int);
create table db5.last_table (i int);
To know if the task is done, I query the database as follows:
select count(*) from information_schema.tables where table_schema like "db%" and table_name="last_table";
If the count is less than 5 (the number of databases that were in my binary logs), I wait more.
Finally, when the count matches the expected one, I get the end time:

set @END = (select max(create_time) from information_schema.tables where table_schema like "db%" and table_name="last_table"');
Now I have two values, @START, and @END
select timediff(@END,@START) as elapsed;
+----------+
| elapsed  |
+----------+
| 00:09:44 |
+----------+
It does not matter if I query the database immediately, or hour after coming back from my errands. Using the table creation times makes sure that I get a clean start and finish time.
I put all the above in a script, and I can check the elapsed time without fear of mistakes.

Sunday, March 13, 2011

A cool terminal tip for Mac users

If you use a Mac, and you are dealing with many similar tasks at once, like examining many database servers in different terminals, you may like this one.
I have been using iTerm 2 for a while, and my handling of parallel tasks has improved a lot. (No, I am not talking about Parallel replication, although I have applied this trick while testing that technology as well.)
iTerm2 has some cool features, and probably the most striking one is split panes. That alone would be a good reason for giving iTerm2 a try. But the one that I use the most, often in combination with Split Panes, is called Send Input to all tabs.
Here is how it works.
Let's say I need to use 4 servers at once, and perform a non-repeating operation in all of them.
So I open a separate window and I split the screen into 5 panes. I connect to each server in the first four panes, and I open a vim instance in the fifth.
With that done, I enable the magic option.


A word of caution. This option sends the input to all the open tabs in your current window. If you don't want this to happen, do as I do, and open a separate window. Then make sure that all tabs, and eventually split panes, are supposed to receive your input. The application asks you for confirmation.


After that, whatever I type on one pane will be mirrored on all the panes. So I will see the commands running on my four servers, and being logged in a text file in the fifth one. All with just single command, I have all servers under control at once:

Replication and sandbox talks on the road - San Francisco, Santa Clara, Orlando

In a few weeks, I will be on the road, for an intense set of presentations in the USA.

San Francisco, April 7th

I will start the tour at the San Francisco MySQL User Group. On April 7, at 6pm I will talk about Advanced MySQL replication for the masses. This talk will explore topics such as bi-directional replication, multiple sources, parallel replication, seamless failover, with the help of Tungsten replicator.


O'Reilly MySQL Conference & Expo 2011

Santa Clara, April 11, 12

The MySQL conference starts on Monday, April 11th, with several three-hour tutorials.

Robert Hodges and Edward Archibald will also be presenting at the MySQL Conference.

On Tuesday evening I will fly to Orlando, to attend (part of) Collaborate11.

collaborate11 badge

Orlando, April 13

Tuesday, March 08, 2011

The MySQL Council addresses the public bug database issue

When I announced the MySQL Council, I said that among its roles there is that of being a bridge between the MySQL community and Oracle.
It has come the time where we put this role to good use. Recently, there have been some concerns about the MySQL bugs database, which could be summarized in Mark Callaghan's post Where have the bugs gone?.
The gist of the concerns is that there has been a change in the bugs handling, although we don't know what was changed and how. In short, there has been a total lack of communication. The MySQL Council has addressed the concerns about the public bug database in a recent meeting, and has taken several steps, like approaching Oracle directly, and releasing a summary of the concerns in its site.

The MySQL Council members have been discussing the decision by Oracle, to reduce the importance of the public MySQL bug database for providing input and direction of product updates and direction. The Council would also like to work with Oracle to promote communication around the status of the database access to the broader community so members will understand what to expect moving forward.

Without communication around the use and changes relating to the public bug database, there have been concerns in the community raised about duplicate bug tracking, bug numbers in commits not being visible to the public, difficulty in offering patches into the MySQL server, and the generalized decreased transparency in the evolution and remediation of the MySQL server and associated products.

The IOUG (Independent Oracle User Group) is supporting the MySQL Council in its efforts to raise questions and query direction from Oracle. The MySQL Council will be meeting with Oracle stakeholders to discuss options for keeping appropriate portions of the database active as well as communicating status and future actions to the broader community.
We don't know the outcome yet. But we'll surely post an update as soon as we hear it.

Monday, March 07, 2011

implementing table quotas in MySQL

I have just seen Limiting table disk quota in MySQL by Shlomi Noach, and I could not resist.
You can actually implement a disk quota using an updatable view with the CHECK OPTION.
Instead of giving the user access to the table, you give access to the view (at least for inserting, see the caveat at the end), and you will get a genuine MySQL error when the limit is reached.

drop table if exists logs;
create table logs (t mediumtext) engine=innodb;

drop function if exists exceeded_logs_quota ;
create function exceeded_logs_quota() 
returns boolean
deterministic
return (
    select CASE 
           WHEN (DATA_LENGTH + INDEX_LENGTH) > (25*1024) 
           THEN TRUE ELSE FALSE 
           END
    FROM 
        information_schema.tables 
    WHERE 
        table_schema=schema() 
        and table_name='logs'
    );

create or replace view logsview as 
    SELECT * FROM logs 
    WHERE NOT exceeded_logs_quota()
    WITH CHECK OPTION;

Here's a test run:
mysql [localhost] {msandbox} (test) > insert into logsview values ('a');
Query OK, 1 row affected (0.00 sec)

mysql [localhost] {msandbox} (test) > select exceeded_logs_quota();
+-----------------------+
| exceeded_logs_quota() |
+-----------------------+
|                     0 |
+-----------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > insert into logsview values (repeat('a', (25 * 1024) - 1));
Query OK, 1 row affected (0.00 sec)

mysql [localhost] {msandbox} (test) > select exceeded_logs_quota();
+-----------------------+
| exceeded_logs_quota() |
+-----------------------+
|                     1 |
+-----------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > insert into logsview values ('b');
ERROR 1369 (HY000): CHECK OPTION failed 'test.logsview'

You will need to twist the limit to adapt to InnoDB habits of allocating pages rather than bytes, but if you measure the limit in MB the method should work fine.

CAVEAT: You should give your users separate privileges: SELECT on logs, and INSERT on logsview. The view will only return records while exceeded_logs_quota() returns false.
mysql [localhost] {msandbox} (test) > select exceeded_logs_quota();
+-----------------------+
| exceeded_logs_quota() |
+-----------------------+
|                     1 |
+-----------------------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (test) > select count(*) from logsview;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.01 sec)

mysql [localhost] {msandbox} (test) > select count(*) from logs;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.01 sec)

Sunday, March 06, 2011

Advanced replication for the masses - Part III - Replication topologies

After part I: the basics, and part II: parallel apply, we deal now with some more mundane topic, or how to deploy replication services in a way that they fit our business, covering from the basic master/slave to the most advanced multi-source scheme.

Master/slave

The most common topology is master/slave. One master, many slaves. This topology is equivalent to MySQL native replication. The differences are in the additional features. Tungsten supports seamless failover and parallel replication in all topologies.

Figure 1. Tungsten master/slave replication
Unlike MySQL, and unlike previous versions of Tungsten, the implementation of this topology uses a dedicated service for the master, and deploys a corresponding service on each slave. In MySQL, and old Tungsten, there is just a general purpose pipeline that connects from master to slave. In Tungsten Replicator, instead, you need to define a service explicitly. While this requirement looks like overkill for such a simple topology, it will be clear that it helps defining advanced topologies.

Relayed replication

The first time that I saw this topology with Tungsten, it was by mistake. I wanted to connect host Delta with host Alpha, but by a combination of bad cut-and-paste and mindlessness, I connected the slave Delta to a seemingly non-existent master Bravo (figure 2).

Figure 2. Tungsten master/slave replication with relay
The funny thing is that I did not realize the error, because the test that I wrote to certify that what I inserted in the master was then usable in all the slaves, worked without a problem. Thus I learned another feature of Tungsten: every slave is also a relay slave, without need of any additional setup. It does not matter if the slave is using binary logs, or if it has enabled the logs-slave-updates option. Those requirements are for MySQL native replication. Tungsten slave replicator services can detect a request from another slave, and act as a relay slave if necessary.

Bi-directional replication

Here is where people used to MySQL replication start to be surprised. To set a master-to-master topology, we deploy two services for each host. The first host sets a master service named alpha, and a remote slave service named bravo. The second host does the opposite: a local master service named bravo, and a remote slave service named alpha. (Figure 3)

Figure 3. Tungsten bi-directional replication
Whatever is updated on host alpha is processed and sent to remote slaves by the master replicator service alpha. Whatever is updated in host bravo is processed by the master replicator service bravo and dispatched to its audience.
This system works like MySQL master-to-master replication. There is no conflict resolution handling. The prerequisite, for this and the following topologies, is that you know what you are doing and leverage the replication system within its limitations.

Multiple site replication

This is an extension of bi-directional replication. For each master, there is one or more slaves. Figure 4 shows a typical implementation. You have a main site in one location, and want to have a backup system in another location. If disaster strikes in your main location, you are ready to switch over to the alternate site with minimum delay.

Figure 4. Tungsten multiple site replication
The reliability and robustness of this scheme depends on your applications. Nothing prevents you from writing to both masters at once. And if you keep the tasks logically separated (e.g. master alpha writes on database products, while master bravo writes on database employees) nothing bad will happen. But if your applications update the same records, you can suffer either a duplicate key error or silent loss of changes when a master writes on top of an already updated record.
This topology is frequently used in combination with a virtual IP provider, a system that shields both clusters from the application's view, and let it connect through an IP that is associated to the active master. There are many such systems, including Linux HA and Tungsten Enterprise. But that is beyond the scope of this post.

Multiple source replication

This topology is extremely popular, at least judging from the many times that it has been requested to me. So strangely popular, in fact, because it does not exist, at least as far as MySQL replication is concerned. One of the limits of MySQL native replication is that every slave can have only one master.
This topology has been explained to me in many ways. One of the most common is this. You have a company that has a headquarters in a town, say New York. That company has stores in many cities, and the management would like to get the data from each store to the headquarters, in real time. This is the opposite of the master/slave topology, and no matter how creative you become, you can't get MySQL to do it.
Using Tungsten, you can implement this topology fairly easily. (Figure 5)

Figure 5. Tungsten multiple source replication
Each server deploys a master replicator service. The host in headquarters deploys one slave service for each of the remote masters.
Of course, there is no conflict resolution handling. If you r remote masters don't behave within the limits that you want them to have, you will get in trouble, and replication may stop. However, if you know what you're doing and set the system properly, at least you can achieve the goal of getting this reverse replication scheme working smoothly.

Hands on

To help the early adopters, I have put together a set of scripts that deploy easily any of the topologies mentioned in this post with just a few actions.
What you need is 4 hosts (four copies of the virtual machine mentioned in part I will do), and the Tungsten deployer package that you will get from the Tungsten downloads page.
Once you have the four server, unpack the deployer scripts on a chosen directory in all servers, making sure that the directory is the same in all four. Inside the package, there is a README file with a detailed list of instructions. So detailed, in fact, that I won't copy it here because it would make this post too long.

I will just show a sample run:
$ ./set_multi_source.sh 
QA1
installation OK

NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 1.243
role            : master
serviceName     : alpha
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

QA2
installation OK

NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 1.264
role            : master
serviceName     : bravo
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

QA3
installation OK

NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 0.812
role            : master
serviceName     : charlie
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

QA4
installation OK

Processing services command...
NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 29.521
role            : slave
serviceName     : alpha
serviceType     : local
started         : true
state           : ONLINE
NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 20.123
role            : slave
serviceName     : bravo
serviceType     : remote
started         : true
state           : ONLINE
NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 12.726
role            : slave
serviceName     : charlie
serviceType     : remote
started         : true
state           : ONLINE
Finished services command...

$ ./test_flow_multi_source.sh 
inserting 100 into each of the three masters. Please wait
Retrieving data from the masters
QA1
+----+-----+------+
| t  | c   | s    |
+----+-----+------+
| t1 | 100 | 5050 |
+----+-----+------+

QA2
+----+-----+------+
| t  | c   | s    |
+----+-----+------+
| t2 | 100 | 5050 |
+----+-----+------+

QA3
+----+-----+------+
| t  | c   | s    |
+----+-----+------+
| t3 | 100 | 5050 |
+----+-----+------+

Retrieving data from the slave
QA4
+----+-----+------+
| t  | c   | s    |
+----+-----+------+
| t1 | 100 | 5050 |
| t2 | 100 | 5050 |
| t3 | 100 | 5050 |
+----+-----+------+

Happy hacking!

Saturday, March 05, 2011

A hidden options file trick

I was listening today to the OurSQL Episode 36: It's Not Our (De)fault! Part 1. As usual, Sheeri and Sarah are very informational and entertaining while explaining the innards of MySQL and their best practices.
Being a DBA oriented show, there was an omission in this podcast. There was no mention of custom groups that you can have for your my.cnf. This is mostly useful for developers. If your application requires some specific settings, instead of using a separated configuration file, you can use a different group, and then instruct your client applications to use that group.
By default, all client applications read the "[client]" group.
But you can tell your client to read a group that you can call whatever you like.
For example, with this configuration file,
[client]
user=common_user
password=common_password

[logrotation]
user=log_rotation_daemon
password=specific_password

You can have a Perl script that takes care of your particular log rotation needs. Instead of the normal credentials, it will use the ones listed in the [logrotation] group.
use strict;
use warnings;
use DBI;

my $dsn =   "DBI:mysql:test;"
            . "mysql_read_default_group=logrotation;"
            . "mysql_read_default_file=$ENV{HOME}/./my.cnf";
my $dbh = DBI->connect($dsn);
Notice that, for this option to work, the [logrotation] group must come after the [client] group, or the directives in the [client] group will override the ones in [logrotation]. That's why, in the options file, you find the directives for [mysqldump] at the bottom of the file.

So far, so good. This was a trick for developers, and probably many developers know it already. But there is another, related trick, that can be used by non-developers as well.
If you knew about these customized groups, you may have realized that you can't use them with the mysql standard command line client. Or, to say it better, there is no clearly documented way of doing so. There is, in fact, a cute trick that you can use.
Let's say that, from time to time, you want to use a different prompt, but you don't want to edit your $HOME/.my.cnf to change it. You just want your prompt to be there in the option file, and be able to recall it when the situation calls for it.
The mysql internal help does not tell anything about groups. However, a careful search of the manual gives this cryptic entry:
  • --defaults-group-suffix=suffix, -g suffix

    In addition to the groups named on the command line, read groups that have the given suffix.
When I found it, I stared at this puzzling statement for a while. I could not understand which are the groups that are named in the command line.
Eventually, I figured out why there is a group-suffix and not simply a group. It means that if you add a suffix to a default group name, and you tell mysql to look for this suffix, then you will be able to use the appropriate group on demand.
For example, this options file will not work.
# wrong
[pinocchio]
prompt='I have a long nose  =======> '

[master]
prompt='master [\h] {\u} (\d) > '

[slave]
prompt='slave [\h] {\u} (\d) > '
But this one will work:
[mysqlpinocchio]
prompt='I have a long nose  =======> '

[mysqlmaster]
prompt='master [\h] {\u} (\d) > '

[mysqlslave]
prompt='slave [\h] {\u} (\d) > '

Here is a test run:

$ mysql --defaults-group-suffix=pinocchio
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 22
Server version: 5.1.54 MySQL Community Server (GPL)

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

I have a long nose  =======> 

The meaning of the suffix part is that mysql will read the default groups (which are [client], and [mysql]), and it will also read any groups that are named "mysqlSUFFIX" or "clientSUFFIX". I have named the group "mysqlpinocchio" and therefore it has been used. It would have worked the same if I had called it "clientpinocchio".

Wednesday, March 02, 2011

beware of the log

The MySQL general log is one of my favorite features for a quick debug. I especially like the ability of starting and stopping it on demand, which was introduced in MySQL 5.1.
However, using the general log has its drawbacks.
Today I was debugging a nasty bug that results from two statements that should be applied sequentially, but that were instead concurrent. These kind of problems are hard to cope with, as they are intermittent. Sometimes all goes well, and you get the expected result. And then, sometimes the statements fly on different directions and I stare at the screen, trying to understand where did they stray.
After some try-and-fail, I decided to enable the general log just before the offending statements, and to turn it down immediately after. Guess what? With the general log on, the test never failed. What was an intermittently failing test became an always succeeding test.
What happened is that the general log delayed the query execution just enough for the following statement to arrive when it was expected.
In the end, the bug had to be unveiled using white box techniques.
Moral of the story: using a general log alters the status of the server. If you use it, be prepared to deal with its side effects.