The Data Charmer

Thursday, April 28, 2011

Replication : different points of view

The following quotes are the first sentences in the replication chapter of two similar books. Both are admin cookbooks. One is for PostgreSQL, one for MySQL.

Replication isn't magic, though it can be pretty cool. It's even cooler when it works, and that's what this chapter is all about.
Replication requires understanding, effort, and patience. There are a significant number of points to get right. My emphasis here is on providing simple approaches to get you started, and some clear best practices on operational robustness

PostgreSQL 9 Admin Cookbook

Replication is an interesting feature of MySQL that can be used for a variety of purposes. It can help to balance server load across multiple machines, ease backups, provide a workaround for the lack of fulltext search capabilities in InnoDB, and much more.

MySQL Admin Cookbook

The PostgreSQL quote warns of a dangerous, difficult, and unexplored path, while the MySQL one is the almost bored remark of someone whose biggest problem is to list how many good things you can do. I guess that being exposed to a given technology for longer time changes one's perception.

Wednesday, April 27, 2011

Open Database camp 2011 - Travel logistics, and don't forget the party

The Open Database Camp 2011 is near. In 9 days, the welcome party starts, and then the conference itself gets going.
If you are coming earlier than Friday, May 6th, you can either use public transportation or book a private seat with a volunteer in the car pooling page. Please help the organizers: post your arrival and departure dates and times, so we may be able to help you even outside the official conference days.
About the conference itself, as everyone should know, it's a un-conference, where the talks will be decided on the spot. But you can book ideas and topics in the sessions page.
Since we will have many participants from Italy, there will be dedicated sessions in Italian in addition to the ones in English, which is the official language of the conference.

Sunday, April 17, 2011

Replication metadata in MySQL 5.6.2

The default storage engine is InnoDB, or is it not?

When MySQL 5.5 went GA, the biggest piece of news was that the default storage engine is now InnoDB. Good news, and hope for a better future, as InnoDB is the most reliable storage engine available for MySQL.

Therefore the expectation is that MySQL 5.6 follows in its steps, and we should see less and less of MyISAM in the database.
The privileges tables, however, are still MyISAM. I was not really expecting to see them disappear so quickly, as I have seen how much work it has been for Drizzle to get rid of them, and even them had to keep MyISAM alive for temporary tables.
However, I was surprised to see that the new tables for replication metadata, the ones that replace the files master.info and relay_log.info are MyISAM by default.
The manual says:

In order for replication to be crash-safe, the slave_master_info and slave_relay_log_info tables must each use a transactional storage engine. By default, both of these tables use MyISAM; this means that, prior to starting replication, you must change both of these tables to use a transaction storage engine if you wish for replication to be crash-safe. You can do this by means of the appropriate ALTER TABLE ... ENGINE=... statements. You should not attempt to change the storage engine used by either of these tables while replication is actually running.

The funny thing is that the manual does not mention InnoDB explicitly, as if there were many transactional engines coming with the official MySQL.

Tables instead of files.

Anyway, I went ahead and tried the new metadata features. In short, the new version allows you to store the data that once were in master.info and relay_log.info in two tables with similar names, located under the mysql schema.

First of all, I changed the storage engine, as suggested by the docs. Actually, the docs are still a bit scarce about this feature. The best instructions are the ones found in Mats Kindahl.

# in the slave
ALTER TABLE mysql.slave_master_info ENGINE = InnoDB;
ALTER TABLE mysql.slave_relay_log_info ENGINE = InnoDB;

# in the slave configuration file
relay-log-info-repository=TABLE
master-info-repository=TABLE

After this operation, I initialized the two slaves, one of which has the new table info, and the other one has still the old files info, for comparison.

# slave 1 (with table info)
show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 12027
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 114
               Relay_Log_File: mysql_sandbox12028-relay-bin.000002
                Relay_Log_Pos: 267
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]

select * from slave_master_info\G
*************************** 1. row ***************************
              Master_id: 101
        Number_of_lines: 20
        Master_log_name: 
         Master_log_pos: 4
                   Host: 
              User_name: 
          User_password: 
                   Port: 3306
          Connect_retry: 60
            Enabled_ssl: 0
                 Ssl_ca: 
             Ssl_capath: 
               Ssl_cert: 
             Ssl_cipher: 
                Ssl_key: 
Ssl_verify_servert_cert: 0
              Heartbeat: 1800
                   Bind: 
     Ignored_server_ids: 0
                   Uuid: 
            Retry_count: 86400


# Slave 2
cat node2/data/master.info 
20
mysql-bin.000001
114
127.0.0.1
rsandbox
rsandbox
12027
60
0





0
1800.000

0
6cb60e24-68e7-11e0-9eec-6c626da07446
86400

Hmmm. Not good. Definitely not good.
Now, according to Mats article, the slave_master_info table is updated every time a slave starts. But this is not the case.
Apparently, you need to restart it at least once more, to get an update.

# slave 1
stop slave; 
start slave;
select * from slave_master_info\G
*************************** 1. row ***************************
              Master_id: 101
        Number_of_lines: 20
        Master_log_name: mysql-bin.000001
         Master_log_pos: 114
                   Host: 127.0.0.1
              User_name: rsandbox
          User_password: rsandbox
                   Port: 12027
          Connect_retry: 60
            Enabled_ssl: 0
                 Ssl_ca: 
             Ssl_capath: 
               Ssl_cert: 
             Ssl_cipher: 
                Ssl_key: 
Ssl_verify_servert_cert: 0
              Heartbeat: 1800
                   Bind: 
     Ignored_server_ids: 0
                   Uuid: 6cb60e24-68e7-11e0-9eec-6c626da07446
            Retry_count: 86400

This lack of update is the default by design. The reasoning is that if you update the table at every transaction, you are slowing down replication beyond acceptable levels. However, it must be noted that the update of the table is way less than the updates of the file.

You can force the slave_master_info and slave_relay_log_info tables to update at every transaction, by setting sync_master_info and sync_relay_log_info. Indeed, with this addition, the table is updated at every transaction.
Therefore the choice is between crash unsafe and fast (with the *.info files) and crash safe and very slow (with the tables).

Usability issues

Let's mention the good news first. This addition is very welcome, because it allows monitoring tools to be implemented directly in SQL. The main difficulty about this problem is that the only metadata available until MySQL 5.5 is "SHOW SLAVE STATUS", which has no related Information_Schema or Performance_Schema table. Thus, getting the status values into a SQL variables is not feasible without external tools. This metadata is no replacement for SHOW SLAVE STATUS (that worklog is still struggling with a slow implementation) but there is enough overlapping that a simple monitoring tool could be created with SQL, stored routines and the event scheduler.

Now, for the bad news:
This implementation leaves me baffled for several reasons.
The lack of updates by default is the biggest concern. There is no option of automatic updates every second, same as synch_binlog. It's all or nothing.

The choice of implementation is not pleasant either. Users would expect the table-based recording to mimic the behavior of the file-based recording, i.e. when replication is started, the table is created, and after a "reset slave' the table is removed. But this does not happen. The table is truncated, and if you remove it, it won't be created when you restart replication.

What's worse, this table can't be dumped with locks. MySQL complains if you attempt to do that.

./s1 -e 'stop slave'

mysqldump mysql slave_master_info
-- MySQL dump 10.13  Distrib 5.6.2-m5, for linux2.6 (x86_64)
--
-- Host: localhost    Database: mysql
-- ------------------------------------------------------
-- Server version 5.6.2-m5-log

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
mysqldump: Got error: 1721: You can't use locks with rpl info tables. when doing LOCK TABLES

This makes more difficult the operation of provisioning a slave from a backup. I would expect that, having stopped the slave, I could backup the table, possibly together with the rest of the database. Maybe MySQL has a hidden clever way of exporting this data, but if that exists, so far it has escaped me.

Update: You can use the following command to dump this table.

mysqldump --master-data mysql slave_master_info

However, a simple

mysqldump --master-data mysql

does not include the *_info tables. (Bug#60902)

Another problem is maintenance. If I want to clean up the InnoDB table space, the usual recipe is to dump everything, stop the server, remove the ib* files, restart the server, and then reload the data.
That has worked very well so far, because there were no innodb tables in the mysql database. Now, however, if we attempt to perform the above operation, we get an error when InnoDB comes online, because it won't find an internal reference to the innodb tables, whose .frm files (and possibly .ibd files) are still dangling around under the mysql folder.

Incidentally, I can note that Tungsten Replicator uses a similar approach (replication metadata is stored in a table, which is updated at every commit), and yet it does not suffer from any of the drawbacks mentioned here. The replication metadata tables are stored in a regular schema, which can be dumped just fine to provision a new slave. The additional commits are not a problem, since Tungsten uses the technique of block commits, where it commits together all the transactions that can be safely grouped.
The safety of the slave thus depends on the value of --innodb-flush-log-at-trx-commit, not on additional trade off decisions.

More to come.

This article covers just a portions of the replication improvements in 5.6. There is much more to see and test.
Specifically, I want to test the performance impact of the metadata tables, and also the performance of the multi-threaded slave prototype against regular replication and Tungsten. I will get around to it shortly. Stay tuned.

Pewter for Tungsten - Thanks, MySQL community!

After the opening keynote at the MySQL Conference, there was the usual ceremony of the MySQL community awards. Since Oracle declined to continue in the MySQL AB tradition of awarding the contributions from the community, the community itself has taken over.
I was pleasantly surprised to find my name among the recipients, and even more about the prize awarded to my company's product, the Tungsten Replicator.
The surprise comes because we have been making noise about this product for only a few months, after we finalized our plans to split the company products between open source and enterprise. Apparently, it has been the right kind of noise, and the community has been able to see that Tungsten is a tangible contribution to the MySQL ecosystem.

Now that the MySQL Conference's frenzy is over, we can continue making Tungsten even better than it is today. As probably everyone knows, we have found a generous sponsor (which should remain unnamed for now, but to whom we are very grateful) that is paying for implementing and strengthening the features that will make Tungsten 2.0 production ready in the near future.
If users want to contribute with their feedback, they can download the binaries (and the full source code) from the new Tungsten home http://tungsten-replicator.org, and report issues in the same site. See Settling in at code.google.com for more information.
More technical info will follow soon.
For now, I just wanted to say Thank you, MySQL community!

Friday, April 15, 2011

Have you missed the MySQL Conference? Come to OpenDbCamp!

The MySQL Conference is over. There have been many new developments, and the ones who have missed it will probably want to get a summary of the excitement, possibly from the people who have contributed to shaping the news.
The Open Database Camp will give users an opportunity to catch up. Especially to open source users in Europe.
Come and share the fun. There will be talks on MySQL, PostgreSQL, several NoSQL products, and a bunch of other cool stuff.

Thursday, April 07, 2011

Refactored again: poor man's MySQL replicator monitor

I saw that both Haidong Ji and Geert VanderKelen have proposed a Python monitor for MySQL replication, calling it the "poor man's version".
See Poor man’s MySQL replication monitoring and Geert's Refactored: Poor man’s MySQL replication monitoring.
Having Python in your server doesn't really qualify as "poor man". In many cases it's a luxury, and thus, here's my shot at the problem, using a Bash shell script.
Unlike its Python-based competition, this version also checks that the slave is replicating from the intended master, and that it is not lagging behind.

#!/bin/bash

USERNAME=msandbox
PASSWORD=msandbox
EXPECTED_MASTER_HOST=127.0.0.1
EXPECTED_MASTER_PORT=27371

SLAVE_HOST=127.0.0.1
SLAVE_PORT=27372

MYSQL="mysql -u $USERNAME -p$PASSWORD "
MASTER="$MYSQL -h $EXPECTED_MASTER_HOST -P $EXPECTED_MASTER_PORT"
SLAVE="$MYSQL -h $SLAVE_HOST -P $SLAVE_PORT"

$MASTER -e 'SHOW MASTER STATUS\G' > mstatus
$SLAVE -e 'SHOW SLAVE STATUS\G' > sstatus

function extract_value {
    FILENAME=$1
    VAR=$2
    grep -w $VAR $FILENAME | awk '{print $2}'
}

Master_Binlog=$(extract_value mstatus File )
Master_Position=$(extract_value mstatus Position )

Master_Host=$(extract_value sstatus Master_Host)
Master_Port=$(extract_value sstatus Master_Port)
Master_Log_File=$(extract_value sstatus Master_Log_File)
Read_Master_Log_Pos=$(extract_value sstatus Read_Master_Log_Pos)
Slave_IO_Running=$(extract_value sstatus Slave_IO_Running)
Slave_SQL_Running=$(extract_value sstatus Slave_SQL_Running)

ERROR_COUNT=0
if [ "$Master_Host" != "$EXPECTED_MASTER_HOST" ]
then
    ERRORS[$ERROR_COUNT]="the slave is not replicating from the host that it is supposed to"
    ERROR_COUNT=$(($ERROR_COUNT+1))
fi

if [ "$Master_Port" != "$EXPECTED_MASTER_PORT" ]
then
    ERRORS[$ERROR_COUNT]="the slave is not replicating from the host that it is supposed to"
    ERROR_COUNT=$(($ERROR_COUNT+1))
fi

if [ "$Master_Binlog" != "$Master_Log_File" ]
then
    ERRORS[$ERROR_COUNT]="master binlog ($Master_Binlog) and Master_Log_File ($Master_Log_File) differ"
    ERROR_COUNT=$(($ERROR_COUNT+1))
fi

POS_DIFFERENCE=$(echo ${Master_Position}-$Read_Master_Log_Pos|bc)

if [ $POS_DIFFERENCE -gt 1000 ]
then
    ERRORS[$ERROR_COUNT]="The slave is lagging behind of $POS_DIFFERENCE"
    ERROR_COUNT=$(($ERROR_COUNT+1))
fi

if [ "$Slave_IO_Running" == "No" ]
then
    ERRORS[$ERROR_COUNT]="Replication is stopped"
    ERROR_COUNT=$(($ERROR_COUNT+1))
fi

if [ "$Slave_SQL_Running" == "No" ]
then
    ERRORS[$ERROR_COUNT]="Replication (SQL) is stopped"
    ERROR_COUNT=$(($ERROR_COUNT+1))
fi

if [ $ERROR_COUNT -gt 0 ]
then
    EMAIL=myname@gmail.com
    SUBJECT="ERRORS in replication"
    BODY=''
    CNT=0
    while [ "$CNT" != "$ERROR_COUNT" ]
    do
        BODY="$BODY ${ERRORS[$CNT]}"
        CNT=$(($CNT+1))
    done
    echo $SUBJECT
    echo $BODY
    echo $BODY | mail -s "$SUBJECT" $EMAIL
else
    echo "Replication OK"
    printf "file: %s at %'d\n" $Master_Log_File  $Read_Master_Log_Pos
fi

Monday, April 04, 2011

Make your voice heard. Tell Oracle and the MySQL Council what bugs you

The MySQL Council has not being idle. We have addressed the bugs database concerns, and we are continuing our dialog.
To do a better job, we would like to hear more from the community. Unlike other established user groups, MySQL does not have a world wide organization for its users. The council exists on a voluntary basis, and we are seeking support from the rest of you. Please let your voice heard. There are three main channels for this:

A MySQL Council survey
A set of questions that will be answered during the keynote at the MySQL Conference
Talk to a council member

Survey

The MySQL Council survey is an attempt to collect, understand, and eventually prioritize what the MySQL community feels, and act on their suggestions. It is a very short survey. We don't want your opinion on everything from barbecue sauce recipes to brain surgery. We need only a few lines about what bothers you the most with MySQL and how we can help. And of course, if you wamt to help, we really want to hear about that.

Questions for the keynote

Let's ask Oracle directly. What are the biggest issues that you have? You can submit your questions for the keynote, and Tomas Ulin will answer them on stage at the MySQL conference
Here is your shot at practicing people power: think of the most pressing questions that you would see Tomas Ulin addressing, and submit them as soon as possible.

Talk to a council member

Council members are all well known people in the community. Most of us will be at the MySQL Conference or at Collaborate 11, or both.
If you want to ask me questions about the council, or MySQL, or community matters, I will do my best to answer them, or to seek an answer if I don't know it myself.

I will be at the San Francisco MySQL User Group on April 7th.
Then I will be at the MySQL conference Monday and Tuesday. Oh, and there is the Community Dinner West on Monday evening!
On Wednesday, I will be at Collaborate 11.
The other council members will be around as well. If you don't feel like filling the survey and prefer person-to-person communication, come see us, and let's talk!

Open Database camp 2011 - Opportunities for sponsors, culture, and more

The Open Database Camp 2011 is barely one month away.

Sponsorship

Many thanks to all the sponsors! We very much appreciate your support.
Speaking of what, here is some important information for sponsors: The venue owners, Sardegna Ricerche, has given us the availabilkity of an ample hall for sponsors, where they can showcase their products and services.
Each sponsor will have a desk, and a double panel sized cm 195 x 75 (6.3 x 2.4 feet).

Culture, fun, and more

The Science park is something unique that geeks may want to visit. It is one of the biggest research centers in Europe, and the owners have graciously organized a guided tour before and after the conference.
Near the conference there is Nora, an archeological site that alone is worth the trip for a visit. You can see it during the welcome party on Friday (if you show up before sunset, that is), or you can visit on your own after the conference.

To give you an idea of what expects you, here is a promotional video of Sardegna Ricerche.

The event should attract many local open source enthusiasts, with varying degrees of knowledge about open database. To meet their curiosity, there will be a parallel beginners track, with introductory sessions to open databases. SQL and noSQL fans, get ready to evangelize your beloved products. There will be many people eager to listen!

Thursday, March 31, 2011

MySQL replication for demanding users

I have been working with MySQL replication for quite a while. I have dealt with simple replication setups and I have experimented with complex ones. Five years ago I wrote an article about advanced MySQL replication, which was mostly a dream on what you could do with imagination and skill, but the matter from that article is still not even remotely ready for production. Yet, since that article, I have been approached by dozens of people who wanted to know how to make the multiple master dream become reality. To all of them, I had to say, "sorry, this is just a proof of concept.Come back in a few years, it may become possible". It still isn't.
Despite its latest great technological advance, MySQL native replication is is very poor of topologies. What you can do with MySQL native replication is master-to-slave (which also includes relayed slaves), master-to-master, and circular replication.

Of these, circular replication is the closest thing to multiple masters that you can get with MySQL native replication, without the addition of third party services.
Circular replication is tricky to set up, although not unreasonably so. It works. With some patience and precision, you can build a cluster of a few nodes in circular replication. With luck, you can get them to work properly, without loops and with the data flowing to all the servers. Your luck runs out the moment one of the servers fails, or replication breaks down for whatsoever reason. Then you see that circular replication is actually more complicated than what it looks on the surface, and it is also quite brittle. That doesn't mean that circular replication is not used in production. It is. I have known several people who use it successfully, although nobody is really happy about it.
In addition to its fragility, circular replication is slow. If you insert data into master A, it has to travel across three nodes before reaching master D.
Another topology that seems to be very popular is the multiple source scheme. It is the opposite of master/slave. Instead of having one master that sends data to many slaves, it is many masters that send data to one slave. Despite its popularity, this topology is yet unimplemented with MySQL native replication. The best you can do to simulate the desired outcome is to do round-robin replication with cron.
With this background, it is no surprise that I was thrilled at the idea of working for a company that has made these dreams become reality. Tungsten replicator allows users to have real multiple masters topologies, and even the much coveted multiple source topology is now within the users grasp.
Compared to MySQL replication, the drawback of using Tungsten is that you need to deal with bigger complexity. It's only natural. With so many more features, there come more pieces to take care of.
An interesting point about multiple masters is the matter of conflict resolution. Asynchronous replication convenience and robustness are countered by lack of means to deal with conflicts. This difficulty has been used many times as the reason for not implementing multiple source replication in MySQL. I have my own ideas on this issue. I am aware of the risks, but if I were allowed to do multiple master replication, I would be glad to take charge of the risks. Updating different databases, or different tables in separate masters is one way of defining a conflict-free scenario where multiple masters or multiple sources could be used safely. If only we could ...
My colleague Robert Hodges has posted some interesting aspects in his blog. The bottom line is that we focus on empowering users with advanced replication features. Conflict resolution can wait. I am sure many users would love to have the problem of how to avoid conflicts if the more demanding problem of how to replicate from many places to one cluster could be solved. The good news is that some sort of conflict detection (and possibly resolution) are possible even now, without slowing down the operations and without complicating our lives unnecessarily. For example, a simple conflict that could be avoided using Tungsten filters is the one resulting in a master that is updating tables that it was not supposed to do. In a scenario where multiple source replication works on the assumption that each master updates a given subset of the data, we can easily detect and eventually reject offending updates. It is not much, but in many practical cases it would be the difference between having robust multiple source replication and doing data load and consolidation manually.

Anyway, back to the present day with very much real multi-master replication available for everyone. To alleviate the fear of the unknown, we are organizing webinars on a regular basis, where we cover the theoretical points and give practical demos of how to use the new features.
If you are a demanding user, this upcoming webinar is for you: MySQL Multi-Master and Multi-Source Replication With Tungsten. Tomorrow, March 31^st, 2011, at 10am PDT.

Thursday, March 17, 2011

How fast is parallel replication? See it live today

I talked about parallel replication last month. Since then, there has been a considerable interest for this feature. As far as I know, Tungsten's is the only implementation of this much coveted feature, so I can only compare with MySQL native replication.
The most compelling question is "how fast is it?"
That's a tricky one. The answer is the same that I give when someone asks me "how fast is MySQL". I always say: it depends.
Running replication in a single thread is sometimes slower than the operations in the master. Many users complain that the single thread can't keep up with the master, and the slave lags behind. True. There is, however, a hidden benefit of single threaded replication: it requires less resources. There is no contention for writing on disk, no need to worry about several users blocking a table. You need to contend with the users that want to read the tables, but the lone writer has an easy job, albeit a hard one.
When we introduce parallel replication, the easy job fades away, and we are faced with the problem: how do I allow several writers to do the work of one? It's a nice problem to have. MySQL native replication does not allow parallel apply, but with Tungsten you can start tackling the issue of allowing several parallel threads to update the system at once. Therefore, this is the same problem that you have on a server where several users are allowed to write at once. If the server has sufficient resources, the operations will be fast. If it doesn't, the operations will lag behind.
Another aspect of the question is "what kind of queries?" If your database is well established and set in stone, and you mostly UPDATEs, the replication performance will depend on how well your server is tuned for concurrent writes. If you run ALTER TABLE statements on a daily basis, your queries will queue up after that ALTER TABLE no matter what. And if you have only INSERT and DELETE queries, parallel replication will probably depend on how fast is your server.
Ultimately, I can tell you that I have seen or experienced directly a wide range of repeatable results. I know cases where parallel replication is three times as fast as native replication. These cases usually involve huge amounts of binary logs, like in the case when your slave needs to be taken off-line for a few hours or even days and then it tries to catch up. Other cases that can be reproduced with a minimal amount of sample data show parallel replication as being 30% to 50% faster. ANd then there are cases when your server is so poor on resources or the load is so unevenly distributed that parallel replication is as fast as native replication. I would say that these cases are easily cured by beefing up the server.
If you want to see a demo of how this replication works, you can join this webinar:
Zoom, Zoom, Zoom! MySQL Parallel Replication With Tungsten Replicator 2.0.
You can tell from the title that we are quite excited about the product that we are building.

Tuesday, March 15, 2011

Quick benchmarking trick

I have been doing quite a lot of benchmarking recently.
I needed to find a safe way of measuring the time spend by the database doing a long task, like catching up on a huge backlog of accumulated replication updates. The problem with measuring this event is that I can record when it starts, but I can't easily detect when it finishes. My initial approach was to monitor the database and count the tables rows to see when the task was done, but I ended up affecting the task performance with my additional queries. So I thought of another method.
Since I had control on what was sent from the master to the slave, I used the following:
The initial time is calculated as the minimum creation time of the databases that I know are created during the exercise. Let's say that I had 5 databases named from db1 to db5:

set @START = (select min(create_time) from information_schema.tables where table_schema like "db%")

Then, to make sure that I catch the exact moment that the task is finished, I added to the master a command for each database:

create table db1.last_table (i int);
create table db2.last_table (i int);
create table db3.last_table (i int);
create table db4.last_table (i int);
create table db5.last_table (i int);

To know if the task is done, I query the database as follows:

select count(*) from information_schema.tables where table_schema like "db%" and table_name="last_table";

If the count is less than 5 (the number of databases that were in my binary logs), I wait more.
Finally, when the count matches the expected one, I get the end time:

set @END = (select max(create_time) from information_schema.tables where table_schema like "db%" and table_name="last_table"');

Now I have two values, @START, and @END

select timediff(@END,@START) as elapsed;
+----------+
| elapsed  |
+----------+
| 00:09:44 |
+----------+

It does not matter if I query the database immediately, or hour after coming back from my errands. Using the table creation times makes sure that I get a clean start and finish time.
I put all the above in a script, and I can check the elapsed time without fear of mistakes.

Sunday, March 13, 2011

A cool terminal tip for Mac users

If you use a Mac, and you are dealing with many similar tasks at once, like examining many database servers in different terminals, you may like this one.
I have been using iTerm 2 for a while, and my handling of parallel tasks has improved a lot. (No, I am not talking about Parallel replication, although I have applied this trick while testing that technology as well.)
iTerm2 has some cool features, and probably the most striking one is split panes. That alone would be a good reason for giving iTerm2 a try. But the one that I use the most, often in combination with Split Panes, is called Send Input to all tabs.
Here is how it works.
Let's say I need to use 4 servers at once, and perform a non-repeating operation in all of them.
So I open a separate window and I split the screen into 5 panes. I connect to each server in the first four panes, and I open a vim instance in the fifth.
With that done, I enable the magic option.

A word of caution. This option sends the input to all the open tabs in your current window. If you don't want this to happen, do as I do, and open a separate window. Then make sure that all tabs, and eventually split panes, are supposed to receive your input. The application asks you for confirmation.

After that, whatever I type on one pane will be mirrored on all the panes. So I will see the commands running on my four servers, and being logged in a text file in the fifth one. All with just single command, I have all servers under control at once:

Replication and sandbox talks on the road - San Francisco, Santa Clara, Orlando

In a few weeks, I will be on the road, for an intense set of presentations in the USA.

San Francisco, April 7th

I will start the tour at the San Francisco MySQL User Group. On April 7, at 6pm I will talk about Advanced MySQL replication for the masses. This talk will explore topics such as bi-directional replication, multiple sources, parallel replication, seamless failover, with the help of Tungsten replicator.

Santa Clara, April 11, 12

The MySQL conference starts on Monday, April 11th, with several three-hour tutorials.

April 11, 9:00amMySQL Replication Advanced Techniques. This will cover MySQL replication beyond the basics. Everything you can do with replication, including, tips, tricks, and gotchas, plus some really advanced things that you can achieve with third party tools.
On tutorial day, at 7pm, there will be the 2011 MySQL community dinner (west), organized by Sarah Novotny and probably attended by the best MySQLers in town. If you want to get the latest gossip and socialize with the top MySQL geeks, this is the place to be.
April 12, 11:55am The Art of Sandboxing - Reducing Complex Systems to Manageable Boxes. After the State of the Dolphin, here's a chance of learning a few practical tips on how to build respectful systems.

Robert Hodges and Edward Archibald will also be presenting at the MySQL Conference.

April 11, 1:30pm Learn how to cure MySQL replication deprivation with Tungsten!
April 13, 2pmPreparing for the Big Oops: How to Build Disaster Recovery Sites for MySQL
TBD Build your own PaaS for MySQL with Tungsten Enterprise

On Tuesday evening I will fly to Orlando, to attend (part of) Collaborate11.

Orlando, April 13

April 13, 1pm Dealing with large data with MySQL partitioning and replication. This talk will explain how to combine replication and partitioning to achieve performance, without any external tool.
April 13, 4pmMySQL Sandbox : a toolkit for productive laziness. One of my favorite subjects. How to use the MySQL Sandbox to achieve quick results with minimal effort

Tuesday, March 08, 2011

The MySQL Council addresses the public bug database issue

When I announced the MySQL Council, I said that among its roles there is that of being a bridge between the MySQL community and Oracle.
It has come the time where we put this role to good use. Recently, there have been some concerns about the MySQL bugs database, which could be summarized in Mark Callaghan's post Where have the bugs gone?.
The gist of the concerns is that there has been a change in the bugs handling, although we don't know what was changed and how. In short, there has been a total lack of communication. The MySQL Council has addressed the concerns about the public bug database in a recent meeting, and has taken several steps, like approaching Oracle directly, and releasing a summary of the concerns in its site.

The MySQL Council members have been discussing the decision by Oracle, to reduce the importance of the public MySQL bug database for providing input and direction of product updates and direction. The Council would also like to work with Oracle to promote communication around the status of the database access to the broader community so members will understand what to expect moving forward.

Without communication around the use and changes relating to the public bug database, there have been concerns in the community raised about duplicate bug tracking, bug numbers in commits not being visible to the public, difficulty in offering patches into the MySQL server, and the generalized decreased transparency in the evolution and remediation of the MySQL server and associated products.

The IOUG (Independent Oracle User Group) is supporting the MySQL Council in its efforts to raise questions and query direction from Oracle. The MySQL Council will be meeting with Oracle stakeholders to discuss options for keeping appropriate portions of the database active as well as communicating status and future actions to the broader community.

We don't know the outcome yet. But we'll surely post an update as soon as we hear it.

Monday, March 07, 2011

implementing table quotas in MySQL

I have just seen Limiting table disk quota in MySQL by Shlomi Noach, and I could not resist.
You can actually implement a disk quota using an updatable view with the CHECK OPTION.
Instead of giving the user access to the table, you give access to the view (at least for inserting, see the caveat at the end), and you will get a genuine MySQL error when the limit is reached.

drop table if exists logs;
create table logs (t mediumtext) engine=innodb;

drop function if exists exceeded_logs_quota ;
create function exceeded_logs_quota() 
returns boolean
deterministic
return (
    select CASE 
           WHEN (DATA_LENGTH + INDEX_LENGTH) > (25*1024) 
           THEN TRUE ELSE FALSE 
           END
    FROM 
        information_schema.tables 
    WHERE 
        table_schema=schema() 
        and table_name='logs'
    );

create or replace view logsview as 
    SELECT * FROM logs 
    WHERE NOT exceeded_logs_quota()
    WITH CHECK OPTION;

Here's a test run:

mysql [localhost] {msandbox} (test) > insert into logsview values ('a');
Query OK, 1 row affected (0.00 sec)

mysql [localhost] {msandbox} (test) > select exceeded_logs_quota();
+-----------------------+
| exceeded_logs_quota() |
+-----------------------+
|                     0 |
+-----------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > insert into logsview values (repeat('a', (25 * 1024) - 1));
Query OK, 1 row affected (0.00 sec)

mysql [localhost] {msandbox} (test) > select exceeded_logs_quota();
+-----------------------+
| exceeded_logs_quota() |
+-----------------------+
|                     1 |
+-----------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > insert into logsview values ('b');
ERROR 1369 (HY000): CHECK OPTION failed 'test.logsview'

You will need to twist the limit to adapt to InnoDB habits of allocating pages rather than bytes, but if you measure the limit in MB the method should work fine.

CAVEAT: You should give your users separate privileges: SELECT on logs, and INSERT on logsview. The view will only return records while exceeded_logs_quota() returns false.

mysql [localhost] {msandbox} (test) > select exceeded_logs_quota();
+-----------------------+
| exceeded_logs_quota() |
+-----------------------+
|                     1 |
+-----------------------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (test) > select count(*) from logsview;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.01 sec)

mysql [localhost] {msandbox} (test) > select count(*) from logs;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.01 sec)

Sunday, March 06, 2011

Advanced replication for the masses - Part III - Replication topologies

After part I: the basics, and part II: parallel apply, we deal now with some more mundane topic, or how to deploy replication services in a way that they fit our business, covering from the basic master/slave to the most advanced multi-source scheme.

Master/slave

The most common topology is master/slave. One master, many slaves. This topology is equivalent to MySQL native replication. The differences are in the additional features. Tungsten supports seamless failover and parallel replication in all topologies.

Figure 1. Tungsten master/slave replication
Unlike MySQL, and unlike previous versions of Tungsten, the implementation of this topology uses a dedicated service for the master, and deploys a corresponding service on each slave. In MySQL, and old Tungsten, there is just a general purpose pipeline that connects from master to slave. In Tungsten Replicator, instead, you need to define a service explicitly. While this requirement looks like overkill for such a simple topology, it will be clear that it helps defining advanced topologies.

Relayed replication

The first time that I saw this topology with Tungsten, it was by mistake. I wanted to connect host Delta with host Alpha, but by a combination of bad cut-and-paste and mindlessness, I connected the slave Delta to a seemingly non-existent master Bravo (figure 2).

Figure 2. Tungsten master/slave replication with relay
The funny thing is that I did not realize the error, because the test that I wrote to certify that what I inserted in the master was then usable in all the slaves, worked without a problem. Thus I learned another feature of Tungsten: every slave is also a relay slave, without need of any additional setup. It does not matter if the slave is using binary logs, or if it has enabled the logs-slave-updates option. Those requirements are for MySQL native replication. Tungsten slave replicator services can detect a request from another slave, and act as a relay slave if necessary.

Bi-directional replication

Here is where people used to MySQL replication start to be surprised. To set a master-to-master topology, we deploy two services for each host. The first host sets a master service named alpha, and a remote slave service named bravo. The second host does the opposite: a local master service named bravo, and a remote slave service named alpha. (Figure 3)

Figure 3. Tungsten bi-directional replication
Whatever is updated on host alpha is processed and sent to remote slaves by the master replicator service alpha. Whatever is updated in host bravo is processed by the master replicator service bravo and dispatched to its audience.
This system works like MySQL master-to-master replication. There is no conflict resolution handling. The prerequisite, for this and the following topologies, is that you know what you are doing and leverage the replication system within its limitations.

Multiple site replication

This is an extension of bi-directional replication. For each master, there is one or more slaves. Figure 4 shows a typical implementation. You have a main site in one location, and want to have a backup system in another location. If disaster strikes in your main location, you are ready to switch over to the alternate site with minimum delay.

Figure 4. Tungsten multiple site replication
The reliability and robustness of this scheme depends on your applications. Nothing prevents you from writing to both masters at once. And if you keep the tasks logically separated (e.g. master alpha writes on database products, while master bravo writes on database employees) nothing bad will happen. But if your applications update the same records, you can suffer either a duplicate key error or silent loss of changes when a master writes on top of an already updated record.
This topology is frequently used in combination with a virtual IP provider, a system that shields both clusters from the application's view, and let it connect through an IP that is associated to the active master. There are many such systems, including Linux HA and Tungsten Enterprise. But that is beyond the scope of this post.

Multiple source replication

This topology is extremely popular, at least judging from the many times that it has been requested to me. So strangely popular, in fact, because it does not exist, at least as far as MySQL replication is concerned. One of the limits of MySQL native replication is that every slave can have only one master.
This topology has been explained to me in many ways. One of the most common is this. You have a company that has a headquarters in a town, say New York. That company has stores in many cities, and the management would like to get the data from each store to the headquarters, in real time. This is the opposite of the master/slave topology, and no matter how creative you become, you can't get MySQL to do it.
Using Tungsten, you can implement this topology fairly easily. (Figure 5)

Figure 5. Tungsten multiple source replication
Each server deploys a master replicator service. The host in headquarters deploys one slave service for each of the remote masters.
Of course, there is no conflict resolution handling. If you r remote masters don't behave within the limits that you want them to have, you will get in trouble, and replication may stop. However, if you know what you're doing and set the system properly, at least you can achieve the goal of getting this reverse replication scheme working smoothly.

Hands on

To help the early adopters, I have put together a set of scripts that deploy easily any of the topologies mentioned in this post with just a few actions.
What you need is 4 hosts (four copies of the virtual machine mentioned in part I will do), and the Tungsten deployer package that you will get from the Tungsten downloads page.
Once you have the four server, unpack the deployer scripts on a chosen directory in all servers, making sure that the directory is the same in all four. Inside the package, there is a README file with a detailed list of instructions. So detailed, in fact, that I won't copy it here because it would make this post too long.

I will just show a sample run:

$ ./set_multi_source.sh 
QA1
installation OK

NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 1.243
role            : master
serviceName     : alpha
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

QA2
installation OK

NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 1.264
role            : master
serviceName     : bravo
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

QA3
installation OK

NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 0.812
role            : master
serviceName     : charlie
serviceType     : local
started         : true
state           : ONLINE
Finished services command...

QA4
installation OK

Processing services command...
NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 29.521
role            : slave
serviceName     : alpha
serviceType     : local
started         : true
state           : ONLINE
NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 20.123
role            : slave
serviceName     : bravo
serviceType     : remote
started         : true
state           : ONLINE
NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 12.726
role            : slave
serviceName     : charlie
serviceType     : remote
started         : true
state           : ONLINE
Finished services command...

$ ./test_flow_multi_source.sh 
inserting 100 into each of the three masters. Please wait
Retrieving data from the masters
QA1
+----+-----+------+
| t  | c   | s    |
+----+-----+------+
| t1 | 100 | 5050 |
+----+-----+------+

QA2
+----+-----+------+
| t  | c   | s    |
+----+-----+------+
| t2 | 100 | 5050 |
+----+-----+------+

QA3
+----+-----+------+
| t  | c   | s    |
+----+-----+------+
| t3 | 100 | 5050 |
+----+-----+------+

Retrieving data from the slave
QA4
+----+-----+------+
| t  | c   | s    |
+----+-----+------+
| t1 | 100 | 5050 |
| t2 | 100 | 5050 |
| t3 | 100 | 5050 |
+----+-----+------+

Happy hacking!

Saturday, March 05, 2011

A hidden options file trick

I was listening today to the OurSQL Episode 36: It's Not Our (De)fault! Part 1. As usual, Sheeri and Sarah are very informational and entertaining while explaining the innards of MySQL and their best practices.
Being a DBA oriented show, there was an omission in this podcast. There was no mention of custom groups that you can have for your my.cnf. This is mostly useful for developers. If your application requires some specific settings, instead of using a separated configuration file, you can use a different group, and then instruct your client applications to use that group.
By default, all client applications read the "[client]" group.
But you can tell your client to read a group that you can call whatever you like.
For example, with this configuration file,

[client]
user=common_user
password=common_password

[logrotation]
user=log_rotation_daemon
password=specific_password

You can have a Perl script that takes care of your particular log rotation needs. Instead of the normal credentials, it will use the ones listed in the [logrotation] group.

use strict;
use warnings;
use DBI;

my $dsn =   "DBI:mysql:test;"
            . "mysql_read_default_group=logrotation;"
            . "mysql_read_default_file=$ENV{HOME}/./my.cnf";
my $dbh = DBI->connect($dsn);

Notice that, for this option to work, the [logrotation] group must come after the [client] group, or the directives in the [client] group will override the ones in [logrotation]. That's why, in the options file, you find the directives for [mysqldump] at the bottom of the file.

So far, so good. This was a trick for developers, and probably many developers know it already. But there is another, related trick, that can be used by non-developers as well.
If you knew about these customized groups, you may have realized that you can't use them with the mysql standard command line client. Or, to say it better, there is no clearly documented way of doing so. There is, in fact, a cute trick that you can use.
Let's say that, from time to time, you want to use a different prompt, but you don't want to edit your $HOME/.my.cnf to change it. You just want your prompt to be there in the option file, and be able to recall it when the situation calls for it.
The mysql internal help does not tell anything about groups. However, a careful search of the manual gives this cryptic entry:

--defaults-group-suffix=suffix, -g suffix

In addition to the groups named on the command line, read groups that have the given suffix.

When I found it, I stared at this puzzling statement for a while. I could not understand which are the groups that are named in the command line.
Eventually, I figured out why there is a group-suffix and not simply a group. It means that if you add a suffix to a default group name, and you tell mysql to look for this suffix, then you will be able to use the appropriate group on demand.
For example, this options file will not work.

# wrong
[pinocchio]
prompt='I have a long nose  =======> '

[master]
prompt='master [\h] {\u} (\d) > '

[slave]
prompt='slave [\h] {\u} (\d) > '

But this one will work:

[mysqlpinocchio]
prompt='I have a long nose  =======> '

[mysqlmaster]
prompt='master [\h] {\u} (\d) > '

[mysqlslave]
prompt='slave [\h] {\u} (\d) > '

Here is a test run:

$ mysql --defaults-group-suffix=pinocchio
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 22
Server version: 5.1.54 MySQL Community Server (GPL)

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

I have a long nose  =======>

The meaning of the suffix part is that mysql will read the default groups (which are [client], and [mysql]), and it will also read any groups that are named "mysqlSUFFIX" or "clientSUFFIX". I have named the group "mysqlpinocchio" and therefore it has been used. It would have worked the same if I had called it "clientpinocchio".

Wednesday, March 02, 2011

beware of the log

The MySQL general log is one of my favorite features for a quick debug. I especially like the ability of starting and stopping it on demand, which was introduced in MySQL 5.1.
However, using the general log has its drawbacks.
Today I was debugging a nasty bug that results from two statements that should be applied sequentially, but that were instead concurrent. These kind of problems are hard to cope with, as they are intermittent. Sometimes all goes well, and you get the expected result. And then, sometimes the statements fly on different directions and I stare at the screen, trying to understand where did they stray.
After some try-and-fail, I decided to enable the general log just before the offending statements, and to turn it down immediately after. Guess what? With the general log on, the test never failed. What was an intermittently failing test became an always succeeding test.
What happened is that the general log delayed the query execution just enough for the following statement to arrive when it was expected.
In the end, the bug had to be unveiled using white box techniques.
Moral of the story: using a general log alters the status of the server. If you use it, be prepared to deal with its side effects.

Friday, February 25, 2011

Advanced replication for the masses - Part II - Parallel replication

I hope you liked the first part of this series of lessons. And I really hope that you have followed the instructions and got your little replication cluster up and working.

If you haven't done that, thinking that you would spare your energies for more juicy matters, I have news for you. What I explained in the previous part is exactly what you need to do to set up parallel replication. With just a tiny additional detail.
For the sake of the diligent readers who have followed the instructions with the first lessons, I won't repeat them, but I'll invite you to set the environment as explained in the first part.
Once you have a cluster up and running, and you can confirm that replication is indeed working with Tungsten, you can remove all with the clear_cluster.sh script, and launch again the set_installation.sh script, with the tiny detail we have mentioned above.
The astute readers may have noticed that the installation script contains these lines:

...
MORE_OPTIONS=$1
./configure-service --create --role=master $MORE_OPTIONS logos1
./tungsten-replicator/bin/trepctl -service logos1 start

./configure-service --create --role=slave --master-host=QA2 \
    --service-type=remote $MORE_OPTIONS logos2
...

This means that you can start set_replication.sh with one additional option, which will be passed to the creation of the Tungsten service. Without further suspense, the addition that you need is --channels=5.
Yep. It was that easy.

./set_replication.sh --channels=5

This little addition will start your Tungsten replicator, apparently in the same way it did before. But there is a substantial difference. While the data is funneled from the master to the slaves in the usual way, the applier splits the data by database. You can see the difference as soon as you send some data through the pipeline.

#master
mysql -h tungsten1 -e 'create schema mydb1'
mysql -h tungsten1 -e 'create schema mydb2'
mysql -h tungsten1 -e 'create schema mydb3'
mysql -h tungsten1 -e 'create table mydb1.t1 (i int)'
mysql -h tungsten1 -e 'create table mydb2.t1 (i int)'
mysql -h tungsten1 -e 'create table mydb3.t1 (i int)'
mysql -h tungsten1 -e 'select seqno,source_id,eventid  from tungsten_logos.trep_commit_seqno'
+-------+-----------+----------------------------+
| seqno | source_id | eventid                    |
+-------+-----------+----------------------------+
|     6 | tungsten1 | 000002:0000000000000939;43 |
+-------+-----------+----------------------------+

Everything under control. The master has sent 6 events through the pipeline. Now, let's see what the slave has to say:

# slave
mysql -h tungsten2 -e 'select seqno,source_id,eventid  from tungsten_logos.trep_commit_seqno'
+-------+-----------+----------------------------+
| seqno | source_id | eventid                    |
+-------+-----------+----------------------------+
|     0 | tungsten1 | 000002:0000000000000426;34 |
|     0 | tungsten1 | 000002:0000000000000426;34 |
|     4 | tungsten1 | 000002:0000000000000763;41 |
|     5 | tungsten1 | 000002:0000000000000851;42 |
|     6 | tungsten1 | 000002:0000000000000939;43 |
+-------+-----------+----------------------------+

Notice, at first sight, that there are five rows instead of one. Each row is a channel. Since the master has used three databases, you see three channels occupied, each one showing the latest sequence that was applied. Now, if we do something to database mydb2, we should see one of these channels change, while the others stay still.

# master
mysql -h tungsten1 -e 'insert into mydb2.t1 values (1)'
mysql -h tungsten1 -e 'insert into mydb2.t1 values (2)'

# slave
mysql -h tungsten2 -e 'select seqno,source_id,eventid  from tungsten_logos.trep_commit_seqno'
+-------+-----------+----------------------------+
| seqno | source_id | eventid                    |
+-------+-----------+----------------------------+
|     0 | tungsten1 | 000002:0000000000000426;34 |
|     0 | tungsten1 | 000002:0000000000000426;34 |
|     4 | tungsten1 | 000002:0000000000000763;41 |
|     8 | tungsten1 | 000002:0000000000001124;45 |
|     6 | tungsten1 | 000002:0000000000000939;43 |
+-------+-----------+----------------------------+

The channel used by mydb2 had previously applied the sequence number 5. The latest sequence number was previously 6, used in another channel. After two more events in this database, the sequence number has jumped to 8.
The eventID has also changed. The first part of the eventID is the binary log number (as in mysql-bin.000002), the second is the log position (1124), and the third one is the session ID (45).
Enough of peeking over the replicator's shoulder. There are more tools that let you inspect the status of the operations.
We have seen trepctl services, which keeps some of its usefulness also with parallel replication. In the master, it says:

trepctl -host tungsten1 services
NAME              VALUE
----              -----
appliedLastSeqno: 8
appliedLatency  : 0.834
role            : master
serviceName     : logos
serviceType     : local
started         : true
state           : ONLINE

Which is mostly all we need to know.
Since the slave has more than one channel, though, we need more specialized information on that side of the applier. For this reason, we use a more specialized view. We may start with trepctl status, which has information that is roughly equivalent to "SHOW SLAVE STATUS" in MySQL native replication.

trepctl -host tungsten2 status 
NAME                     VALUE
----                     -----
appliedLastEventId     : 000002:0000000000000426;34
appliedLastSeqno       : 0
appliedLatency         : 0.846
clusterName            : 
currentEventId         : NONE
currentTimeMillis      : 1298626724016
dataServerHost         : tungsten2
extensions             : 
host                   : null
latestEpochNumber      : 0
masterConnectUri       : thl://tungsten1:2112/
masterListenUri        : thl://tungsten2:2112/
maximumStoredSeqNo     : 8
minimumStoredSeqNo     : 0
offlineRequests        : NONE
pendingError           : NONE
pendingErrorCode       : NONE
pendingErrorEventId    : NONE
pendingErrorSeqno      : -1
pendingExceptionMessage: NONE
resourcePrecedence     : 99
rmiPort                : -1
role                   : slave
seqnoType              : java.lang.Long
serviceName            : logos
serviceType            : local
simpleServiceName      : logos
siteName               : default
sourceId               : tungsten2
state                  : ONLINE
timeInStateSeconds     : 3483.836
uptimeSeconds          : 3489.47

Also this command, which is perfectly useful in single channel replication, lacks the kind of detail that we are after. Tungsten 2.0 introduces two variations of this command, with more detailed metadata.

trepctl -host tungsten2 status -name tasks
Processing status command (tasks)...
NAME                VALUE
----                -----
appliedLastEventId: 000002:0000000000001305;46
appliedLastSeqno  : 8
appliedLatency    : 0.84
cancelled         : false
eventCount        : 9
stage             : remote-to-thl
taskId            : 0
NAME                VALUE
----                -----
appliedLastEventId: 000002:0000000000001305;46
appliedLastSeqno  : 8
appliedLatency    : 0.841
cancelled         : false
eventCount        : 9
stage             : thl-to-q
taskId            : 0
NAME                VALUE
----                -----
appliedLastEventId: 000002:0000000000000426;34
appliedLastSeqno  : 0
appliedLatency    : 8.422
cancelled         : false
eventCount        : 2
stage             : q-to-dbms
taskId            : 0
NAME                VALUE
----                -----
appliedLastEventId: 000002:0000000000000426;34
appliedLastSeqno  : 0
appliedLatency    : 8.424
cancelled         : false
eventCount        : 1
stage             : q-to-dbms
taskId            : 1
NAME                VALUE
----                -----
appliedLastEventId: 000002:0000000000000763;41
appliedLastSeqno  : 4
appliedLatency    : 0.242
cancelled         : false
eventCount        : 3
stage             : q-to-dbms
taskId            : 2
NAME                VALUE
----                -----
appliedLastEventId: 000002:0000000000001305;46
appliedLastSeqno  : 8
appliedLatency    : 0.846
cancelled         : false
eventCount        : 5
stage             : q-to-dbms
taskId            : 3
NAME                VALUE
----                -----
appliedLastEventId: 000002:0000000000000939;43
appliedLastSeqno  : 6
appliedLatency    : 0.296
cancelled         : false
eventCount        : 3
stage             : q-to-dbms
taskId            : 4

The -name tasks command gives you a list of the latest tasks that were happening.
This is probably more information that you want to know about, but in case of troubleshooting it may become a blessing. Let's follow for a moment what's going on to appliedLastSeqno 8. You will find three tasks with this sequance number. The first one has stage "remote-to-thl", which is the stage where the transaction is transported from the master to the Transaction History List (THL, which is Tungsten lingo to what you may also call a relay log.). The second task that mentions appliedLastSeqno 8 is in stage "thl-to-q", which is the phase where a transaction is assigned to a given shard. The third occurrence happens in stage "q-to-dbms", which is where the transaction is executed in the slave.
For a different view of what is going on, you may use trepctl status -name shards. A Shard, in this context, is the criteria used to split the transactions across channels. By default, it happens by database. We will inspect its mechanics more closely in another post. For now, let's have a look at what shards we have in our slave:

trepctl -host tungsten2 status -name shards
NAME                VALUE
----                -----
appliedLastEventId: 000002:0000000000000763;41
appliedLastSeqno  : 4
appliedLatency    : 0.0
eventCount        : 2
shardId           : mydb1
stage             : q-to-dbms
NAME                VALUE
----                -----
appliedLastEventId: 000002:0000000000001305;46
appliedLastSeqno  : 8
appliedLatency    : 0.0
eventCount        : 4
shardId           : mydb2
stage             : q-to-dbms
NAME                VALUE
----                -----
appliedLastEventId: 000002:0000000000000939;43
appliedLastSeqno  : 6
appliedLatency    : 0.0
eventCount        : 2
shardId           : mydb3
stage             : q-to-dbms
NAME                VALUE
----                -----
appliedLastEventId: 000002:0000000000000426;34
appliedLastSeqno  : 0
appliedLatency    : 0.0
eventCount        : 6
shardId           : tungsten_logos
stage             : q-to-dbms

You may read the information quite easily. Each shard tells you by which key it was identified (shardID), and this is the same as the database name. The appliedLastSeqno and stage we have met already. The appliedLatency is roughly equivalent to MySQL's seconds behind master, but more granular than that. And eventCount tells you how many transactions went through this shard.
If you are the adventurous type, you may have a look at the THL itself, and get a glimpse of how the replication and the parallelism works.
In the slave, type the following

# slave
thl -service logos list |less

Then look for "SEQ#" and you will find the global transaction IDs, or look for "shard=", and you will see the split by database.

More goodies will come next week. Until then, happy hacking!

Tuesday, February 22, 2011

Advanced replication for the masses - Part I - Getting started with Tungsten Replicator

MySQL DBAs and developers: oil your fingers and get ready to experience a new dimension of data replication. I am pleased to announce that Continuent has just released Tungsten Replicator 2.0, an open source data replication engine that can replace MySQL native replication with a set of advanced features.
A note about the source code. The current version of Tungsten Replicator available in the web site is free to use, but it is not yet the open source version. We need a few weeks more to extract the code from the enterprise tree and make a new build. But we did not want to delay the user experience. So everything that is in this build will come with the source code in a short while. In the meantime, enjoy what is available there and have as much fun as we are having.

Why you will want to install Tungsten Replicator 2.0

Tungsten Replicator has a real cool list of features. I am sure that most MySQL DBAs would find something in that list that makes their mouth water in expectation.
Among my favorite features, there is one that looks so innocently non-important that you may be tempted to dismiss it. I am talking about global transaction ID, which is paramount in helping the DBA in switching from master to slave in case of failure or maintenance. I will show an example of a seamless failover in this article.
More things to get excited about: Tungsten allows multiple master replication, i.e. one slave receiving data from several sources, and parallel replication, meaning that a slave can apply changes from the master using many parallel threads. I will talk about all of those features in my blog. But to get to that point, I will need to start by covering the basic installation first. Since Tungsten is much more powerful than MySQL native replication, it also comes with greater complexity. We are working at reducing such complexity. In the meantime, you can start with the instructions that come in this post.

Getting ready

You will need at least two servers, with Java 1.6, Ruby 1.8, and MySQL 5.1 installed.
You may use your own virtual machines, or spare servers, or you can use a pre-defined VMWare image that you can use with VMware player (or VMware Fusion on Mac).
The following instructions refer to the pre-configured VM. You may skip the initial steps if you are using your own servers.

download a pre-configured image
https://files.continuent.com.s3.amazonaws.com/Tungsten_MySQL_CentOS_5_5_VMWare_Image.7z
Warning: it's 1.5 GB, and it expands to 5.5 GB
Expand the VM
Make a copy of the VM. Change the directory names so that you will refer to them as tungsten_vm1 and tungsten_vm2
launch both VMs
Connect to each VM. User names and password for root are in a .doc file within the VM directory.
Change the hostname of the VMs to tungsten1 and tungsten2 (don't forget to modify /etc/sysconfig/network to make the name sticky)
Update /etc/hosts/ with the IP address and hostname of both servers
Switch to the tungsten user
```
su - tungsten
```
Create a directory $HOME/replicator

Get the Tungsten package into that directory

cd replicator
wget https://s3.amazonaws.com/releases.continuent.com/tungsten-replicator-2.0.0.tar.gz

Get the setup scripts from Tungsten Replicator home .

wget http://tungsten-replicator.googlecode.com/files/simple_install_master_slave.tar.gz

unpack the scripts in $HOME/replicator

I know this was a long list, but it is not terribly difficult. More difficult would be setting all the above manually. As it is today, all you need to do is running the "set_replication.sh" script and Tungsten will come alive to your server in less than one minute.
To do things properly, you will need to do the same operations on both servers. So, assuming that you have done everything in tungsten1, you can easily mirror the operations to tungsten2. The virtual machines come with an already installed public SSH key that makes your installation life easier.

# in tungsten1
cd $HOME/replicator
ssh tungsten2 mkdir replicator
scp simple_install_master_slave.tar.gz tungsten2:$PWD
scp tungsten-replicator-2.0.0.tar.gz tungsten2:$PWD
ssh tungsten2 'cd replicator; tar -xzf simple_install_master_slave.tar.gz '

Now that you have the same set of files in both machines, you can trust the wisdom of the installation files and run:

# tungsten1
./set_replication.sh
ssh tungsten2 $PWD/set_replication.sh

This will start the Tungsten replicator in both servers.

Cleaning up

The sample scripts come with one that is dedicated to cleaning up. There is a "clear_cluster.sh" script that will remove all test data from the database, sweep the tungsten directory away, leaving your system ready to start afresh. As this is a testing environment, this strategy is not so bad. But be aware of the potentially destructive nature of this script, and don't use it in a production environment.

Under the hood

Tungsten replicator is a complex piece of software, and it's easy to get lost. So here are a few tips on how to get your bearings.
You will find a log file under $HOME/replicator/tungsten/tungsten-replicator/logs/.
This is quite a noisy log, which is supposed to give the developers all information about what's going on in case of a failure. For newcomers, it is quite intimidating, but we are working at making it easier to read. (Be aware that you may find some references to "tungsten-enterprise" in the logs. Don't let this fact deter you. We are working at splitting the former name associations from the packages, and eventually you will only find references to modules named "tungsten-replicator-something" in the logs.)
At the end of the installation, you should have seen a line inviting you to modify your path to get the replication tools available at your fingertips. Most notable is trepctl, the Tungsten Replicator ConTroL.
Using this tool, you can get some information about the replicator status, and perform administrative tasks. A glimpse at the Tungsten Replicator Guide 2.0 will give you an idea of what you can do.
For now, suffices to say that you can use trepctl to get the state of the replicator.
Try, for example, the following:


$ trepctl -host tungsten1 services
NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 0.933
role            : master
serviceName     : logos
serviceType     : local
started         : true
state           : ONLINE

$ trepctl -host tungsten2 services
NAME              VALUE
----              -----
appliedLastSeqno: 0
appliedLatency  : 0.966
role            : slave
serviceName     : logos
serviceType     : local
started         : true
state           : ONLINE

The most important things here are the "state" field, and the "appliedLastSeqno", which is the global transaction ID that we have mentioned before.
If you create or modify something in the master and issue this command again, you will see that the appliedLastSeqno will increment.
You can get some of this information from the MySQL database, where Tungsten keeps a table with the latest status. You may say that this table is roughly equivalent, at least in principle, to the information in SHOW SLAVE STATUS available with native replication.


$ mysql -h tungsten1 -u tungsten -psecret \
    -e 'select * from tungsten_logos.trep_commit_seqno\G'
*************************** 1. row ***************************
        task_id: 0
          seqno: 0
         fragno: 0
      last_frag: 1
      source_id: tungsten1
   epoch_number: 0
        eventid: 000002:0000000000000416;102
applied_latency: 0

What is this "tungsten_logos' database? It is the database that Tungsten creates for each service that was installed. In this case, 'logos' is the service name contained in this sample installation. If you modify the scripts in both servers, and replace 'logos' with 'ripe_mango', you will see that Tungsten creates a 'tungsten_ripe_mango' database, with the same kind of information.

The basic principle to acquire before moving to more complex topics is that replication in Tungsten is a collection of services. While the native MySQL replication is a simple pipeline from master to slave, without deviations, Tungsten implements several pipelines, which you can use one by one or in combination. It looks more complex than necessary, but in reality it makes your planning of complex topologies much easier. Instead of making basic replication more complex, Tungsten adopt the principle of deploying the appropriate pipeline or pipelines for the task.
I leave to Robert Hodges, CEO and main architect of Tungsten, the task of explaining the nuts and bolts.

A sample of Tungsten power: switching from master to slave

It is probably too much information already for a blog post, but I would like to leave you with the feeling that you are dealing with an extremely powerful tool.
The instructions below will perform a seamless switch between the master and the slave.
Please follow these steps, but make sure there is no traffic hitting the old master during this time, or you may experience consistency issues:


#first, we tell both servers to stop replicating
$ trepctl -service logos -host tungsten2 offline
$ trepctl -service logos -host tungsten1 offline

# Now that they are offline, we tell each server its new role
# tungsten2 becomes the new master
$ trepctl -service logos -host tungsten2 setrole -role master 

# and then we tell tungsten1 that it's going to be a slave,
# listening to tungsten2 for data
$ trepctl -service logos -host tungsten1 setrole -role slave -uri thl://tungsten2

# now we put both servers online with the new instructions
$ trepctl -service logos -host tungsten2 online
$ trepctl -service logos -host tungsten1 online

# and we check that indeed they are both online with the new roles.
$ trepctl -host tungsten1 services
$ trepctl -host tungsten2 services

After this set of instructions, tungsten2 is the master, and if we write to it, we will see the changes replicating to tungsten1.

That's it for today. In the next articles, we will take a look at parallel replication.

We want to hear from you

We have released Tungsten Replicator as open source because we believe this will improve the quality of our product. We are looking for bug reports, cooperation, suggestions, patches, and anything that can make the product better. You can report bugs at the project home.
We are particularly eager to hear about user experience. We are aware that the user interface can be better, and we need some input on this matter from interested users.

A note about security

What is described in this article is for testing purposes only. Please use the virtual machines that were mentioned in this article behind a firewall. The VM was designed with friendliness in mind, but as it is, it's far from secure.