The Data Charmer: April 2011

Thursday, April 28, 2011

Replication : different points of view

The following quotes are the first sentences in the replication chapter of two similar books. Both are admin cookbooks. One is for PostgreSQL, one for MySQL.

Replication isn't magic, though it can be pretty cool. It's even cooler when it works, and that's what this chapter is all about.
Replication requires understanding, effort, and patience. There are a significant number of points to get right. My emphasis here is on providing simple approaches to get you started, and some clear best practices on operational robustness

PostgreSQL 9 Admin Cookbook

Replication is an interesting feature of MySQL that can be used for a variety of purposes. It can help to balance server load across multiple machines, ease backups, provide a workaround for the lack of fulltext search capabilities in InnoDB, and much more.

MySQL Admin Cookbook

The PostgreSQL quote warns of a dangerous, difficult, and unexplored path, while the MySQL one is the almost bored remark of someone whose biggest problem is to list how many good things you can do. I guess that being exposed to a given technology for longer time changes one's perception.

Wednesday, April 27, 2011

Open Database camp 2011 - Travel logistics, and don't forget the party

The Open Database Camp 2011 is near. In 9 days, the welcome party starts, and then the conference itself gets going.
If you are coming earlier than Friday, May 6th, you can either use public transportation or book a private seat with a volunteer in the car pooling page. Please help the organizers: post your arrival and departure dates and times, so we may be able to help you even outside the official conference days.
About the conference itself, as everyone should know, it's a un-conference, where the talks will be decided on the spot. But you can book ideas and topics in the sessions page.
Since we will have many participants from Italy, there will be dedicated sessions in Italian in addition to the ones in English, which is the official language of the conference.

Sunday, April 17, 2011

Replication metadata in MySQL 5.6.2

The default storage engine is InnoDB, or is it not?

When MySQL 5.5 went GA, the biggest piece of news was that the default storage engine is now InnoDB. Good news, and hope for a better future, as InnoDB is the most reliable storage engine available for MySQL.

Therefore the expectation is that MySQL 5.6 follows in its steps, and we should see less and less of MyISAM in the database.
The privileges tables, however, are still MyISAM. I was not really expecting to see them disappear so quickly, as I have seen how much work it has been for Drizzle to get rid of them, and even them had to keep MyISAM alive for temporary tables.
However, I was surprised to see that the new tables for replication metadata, the ones that replace the files master.info and relay_log.info are MyISAM by default.
The manual says:

In order for replication to be crash-safe, the slave_master_info and slave_relay_log_info tables must each use a transactional storage engine. By default, both of these tables use MyISAM; this means that, prior to starting replication, you must change both of these tables to use a transaction storage engine if you wish for replication to be crash-safe. You can do this by means of the appropriate ALTER TABLE ... ENGINE=... statements. You should not attempt to change the storage engine used by either of these tables while replication is actually running.

The funny thing is that the manual does not mention InnoDB explicitly, as if there were many transactional engines coming with the official MySQL.

Tables instead of files.

Anyway, I went ahead and tried the new metadata features. In short, the new version allows you to store the data that once were in master.info and relay_log.info in two tables with similar names, located under the mysql schema.

First of all, I changed the storage engine, as suggested by the docs. Actually, the docs are still a bit scarce about this feature. The best instructions are the ones found in Mats Kindahl.

# in the slave
ALTER TABLE mysql.slave_master_info ENGINE = InnoDB;
ALTER TABLE mysql.slave_relay_log_info ENGINE = InnoDB;

# in the slave configuration file
relay-log-info-repository=TABLE
master-info-repository=TABLE

After this operation, I initialized the two slaves, one of which has the new table info, and the other one has still the old files info, for comparison.

# slave 1 (with table info)
show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 12027
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 114
               Relay_Log_File: mysql_sandbox12028-relay-bin.000002
                Relay_Log_Pos: 267
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
[...]

select * from slave_master_info\G
*************************** 1. row ***************************
              Master_id: 101
        Number_of_lines: 20
        Master_log_name: 
         Master_log_pos: 4
                   Host: 
              User_name: 
          User_password: 
                   Port: 3306
          Connect_retry: 60
            Enabled_ssl: 0
                 Ssl_ca: 
             Ssl_capath: 
               Ssl_cert: 
             Ssl_cipher: 
                Ssl_key: 
Ssl_verify_servert_cert: 0
              Heartbeat: 1800
                   Bind: 
     Ignored_server_ids: 0
                   Uuid: 
            Retry_count: 86400


# Slave 2
cat node2/data/master.info 
20
mysql-bin.000001
114
127.0.0.1
rsandbox
rsandbox
12027
60
0





0
1800.000

0
6cb60e24-68e7-11e0-9eec-6c626da07446
86400

Hmmm. Not good. Definitely not good.
Now, according to Mats article, the slave_master_info table is updated every time a slave starts. But this is not the case.
Apparently, you need to restart it at least once more, to get an update.

# slave 1
stop slave; 
start slave;
select * from slave_master_info\G
*************************** 1. row ***************************
              Master_id: 101
        Number_of_lines: 20
        Master_log_name: mysql-bin.000001
         Master_log_pos: 114
                   Host: 127.0.0.1
              User_name: rsandbox
          User_password: rsandbox
                   Port: 12027
          Connect_retry: 60
            Enabled_ssl: 0
                 Ssl_ca: 
             Ssl_capath: 
               Ssl_cert: 
             Ssl_cipher: 
                Ssl_key: 
Ssl_verify_servert_cert: 0
              Heartbeat: 1800
                   Bind: 
     Ignored_server_ids: 0
                   Uuid: 6cb60e24-68e7-11e0-9eec-6c626da07446
            Retry_count: 86400

This lack of update is the default by design. The reasoning is that if you update the table at every transaction, you are slowing down replication beyond acceptable levels. However, it must be noted that the update of the table is way less than the updates of the file.

You can force the slave_master_info and slave_relay_log_info tables to update at every transaction, by setting sync_master_info and sync_relay_log_info. Indeed, with this addition, the table is updated at every transaction.
Therefore the choice is between crash unsafe and fast (with the *.info files) and crash safe and very slow (with the tables).

Usability issues

Let's mention the good news first. This addition is very welcome, because it allows monitoring tools to be implemented directly in SQL. The main difficulty about this problem is that the only metadata available until MySQL 5.5 is "SHOW SLAVE STATUS", which has no related Information_Schema or Performance_Schema table. Thus, getting the status values into a SQL variables is not feasible without external tools. This metadata is no replacement for SHOW SLAVE STATUS (that worklog is still struggling with a slow implementation) but there is enough overlapping that a simple monitoring tool could be created with SQL, stored routines and the event scheduler.

Now, for the bad news:
This implementation leaves me baffled for several reasons.
The lack of updates by default is the biggest concern. There is no option of automatic updates every second, same as synch_binlog. It's all or nothing.

The choice of implementation is not pleasant either. Users would expect the table-based recording to mimic the behavior of the file-based recording, i.e. when replication is started, the table is created, and after a "reset slave' the table is removed. But this does not happen. The table is truncated, and if you remove it, it won't be created when you restart replication.

What's worse, this table can't be dumped with locks. MySQL complains if you attempt to do that.

./s1 -e 'stop slave'

mysqldump mysql slave_master_info
-- MySQL dump 10.13  Distrib 5.6.2-m5, for linux2.6 (x86_64)
--
-- Host: localhost    Database: mysql
-- ------------------------------------------------------
-- Server version 5.6.2-m5-log

/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
mysqldump: Got error: 1721: You can't use locks with rpl info tables. when doing LOCK TABLES

This makes more difficult the operation of provisioning a slave from a backup. I would expect that, having stopped the slave, I could backup the table, possibly together with the rest of the database. Maybe MySQL has a hidden clever way of exporting this data, but if that exists, so far it has escaped me.

Update: You can use the following command to dump this table.

mysqldump --master-data mysql slave_master_info

However, a simple

mysqldump --master-data mysql

does not include the *_info tables. (Bug#60902)

Another problem is maintenance. If I want to clean up the InnoDB table space, the usual recipe is to dump everything, stop the server, remove the ib* files, restart the server, and then reload the data.
That has worked very well so far, because there were no innodb tables in the mysql database. Now, however, if we attempt to perform the above operation, we get an error when InnoDB comes online, because it won't find an internal reference to the innodb tables, whose .frm files (and possibly .ibd files) are still dangling around under the mysql folder.

Incidentally, I can note that Tungsten Replicator uses a similar approach (replication metadata is stored in a table, which is updated at every commit), and yet it does not suffer from any of the drawbacks mentioned here. The replication metadata tables are stored in a regular schema, which can be dumped just fine to provision a new slave. The additional commits are not a problem, since Tungsten uses the technique of block commits, where it commits together all the transactions that can be safely grouped.
The safety of the slave thus depends on the value of --innodb-flush-log-at-trx-commit, not on additional trade off decisions.

More to come.

This article covers just a portions of the replication improvements in 5.6. There is much more to see and test.
Specifically, I want to test the performance impact of the metadata tables, and also the performance of the multi-threaded slave prototype against regular replication and Tungsten. I will get around to it shortly. Stay tuned.

Pewter for Tungsten - Thanks, MySQL community!

After the opening keynote at the MySQL Conference, there was the usual ceremony of the MySQL community awards. Since Oracle declined to continue in the MySQL AB tradition of awarding the contributions from the community, the community itself has taken over.
I was pleasantly surprised to find my name among the recipients, and even more about the prize awarded to my company's product, the Tungsten Replicator.
The surprise comes because we have been making noise about this product for only a few months, after we finalized our plans to split the company products between open source and enterprise. Apparently, it has been the right kind of noise, and the community has been able to see that Tungsten is a tangible contribution to the MySQL ecosystem.

Now that the MySQL Conference's frenzy is over, we can continue making Tungsten even better than it is today. As probably everyone knows, we have found a generous sponsor (which should remain unnamed for now, but to whom we are very grateful) that is paying for implementing and strengthening the features that will make Tungsten 2.0 production ready in the near future.
If users want to contribute with their feedback, they can download the binaries (and the full source code) from the new Tungsten home http://tungsten-replicator.org, and report issues in the same site. See Settling in at code.google.com for more information.
More technical info will follow soon.
For now, I just wanted to say Thank you, MySQL community!

Friday, April 15, 2011

Have you missed the MySQL Conference? Come to OpenDbCamp!

The MySQL Conference is over. There have been many new developments, and the ones who have missed it will probably want to get a summary of the excitement, possibly from the people who have contributed to shaping the news.
The Open Database Camp will give users an opportunity to catch up. Especially to open source users in Europe.
Come and share the fun. There will be talks on MySQL, PostgreSQL, several NoSQL products, and a bunch of other cool stuff.

Thursday, April 07, 2011

Refactored again: poor man's MySQL replicator monitor

I saw that both Haidong Ji and Geert VanderKelen have proposed a Python monitor for MySQL replication, calling it the "poor man's version".
See Poor man’s MySQL replication monitoring and Geert's Refactored: Poor man’s MySQL replication monitoring.
Having Python in your server doesn't really qualify as "poor man". In many cases it's a luxury, and thus, here's my shot at the problem, using a Bash shell script.
Unlike its Python-based competition, this version also checks that the slave is replicating from the intended master, and that it is not lagging behind.

#!/bin/bash

USERNAME=msandbox
PASSWORD=msandbox
EXPECTED_MASTER_HOST=127.0.0.1
EXPECTED_MASTER_PORT=27371

SLAVE_HOST=127.0.0.1
SLAVE_PORT=27372

MYSQL="mysql -u $USERNAME -p$PASSWORD "
MASTER="$MYSQL -h $EXPECTED_MASTER_HOST -P $EXPECTED_MASTER_PORT"
SLAVE="$MYSQL -h $SLAVE_HOST -P $SLAVE_PORT"

$MASTER -e 'SHOW MASTER STATUS\G' > mstatus
$SLAVE -e 'SHOW SLAVE STATUS\G' > sstatus

function extract_value {
    FILENAME=$1
    VAR=$2
    grep -w $VAR $FILENAME | awk '{print $2}'
}

Master_Binlog=$(extract_value mstatus File )
Master_Position=$(extract_value mstatus Position )

Master_Host=$(extract_value sstatus Master_Host)
Master_Port=$(extract_value sstatus Master_Port)
Master_Log_File=$(extract_value sstatus Master_Log_File)
Read_Master_Log_Pos=$(extract_value sstatus Read_Master_Log_Pos)
Slave_IO_Running=$(extract_value sstatus Slave_IO_Running)
Slave_SQL_Running=$(extract_value sstatus Slave_SQL_Running)

ERROR_COUNT=0
if [ "$Master_Host" != "$EXPECTED_MASTER_HOST" ]
then
    ERRORS[$ERROR_COUNT]="the slave is not replicating from the host that it is supposed to"
    ERROR_COUNT=$(($ERROR_COUNT+1))
fi

if [ "$Master_Port" != "$EXPECTED_MASTER_PORT" ]
then
    ERRORS[$ERROR_COUNT]="the slave is not replicating from the host that it is supposed to"
    ERROR_COUNT=$(($ERROR_COUNT+1))
fi

if [ "$Master_Binlog" != "$Master_Log_File" ]
then
    ERRORS[$ERROR_COUNT]="master binlog ($Master_Binlog) and Master_Log_File ($Master_Log_File) differ"
    ERROR_COUNT=$(($ERROR_COUNT+1))
fi

POS_DIFFERENCE=$(echo ${Master_Position}-$Read_Master_Log_Pos|bc)

if [ $POS_DIFFERENCE -gt 1000 ]
then
    ERRORS[$ERROR_COUNT]="The slave is lagging behind of $POS_DIFFERENCE"
    ERROR_COUNT=$(($ERROR_COUNT+1))
fi

if [ "$Slave_IO_Running" == "No" ]
then
    ERRORS[$ERROR_COUNT]="Replication is stopped"
    ERROR_COUNT=$(($ERROR_COUNT+1))
fi

if [ "$Slave_SQL_Running" == "No" ]
then
    ERRORS[$ERROR_COUNT]="Replication (SQL) is stopped"
    ERROR_COUNT=$(($ERROR_COUNT+1))
fi

if [ $ERROR_COUNT -gt 0 ]
then
    EMAIL=myname@gmail.com
    SUBJECT="ERRORS in replication"
    BODY=''
    CNT=0
    while [ "$CNT" != "$ERROR_COUNT" ]
    do
        BODY="$BODY ${ERRORS[$CNT]}"
        CNT=$(($CNT+1))
    done
    echo $SUBJECT
    echo $BODY
    echo $BODY | mail -s "$SUBJECT" $EMAIL
else
    echo "Replication OK"
    printf "file: %s at %'d\n" $Master_Log_File  $Read_Master_Log_Pos
fi

Monday, April 04, 2011

Make your voice heard. Tell Oracle and the MySQL Council what bugs you

The MySQL Council has not being idle. We have addressed the bugs database concerns, and we are continuing our dialog.
To do a better job, we would like to hear more from the community. Unlike other established user groups, MySQL does not have a world wide organization for its users. The council exists on a voluntary basis, and we are seeking support from the rest of you. Please let your voice heard. There are three main channels for this:

A MySQL Council survey
A set of questions that will be answered during the keynote at the MySQL Conference
Talk to a council member

Survey

The MySQL Council survey is an attempt to collect, understand, and eventually prioritize what the MySQL community feels, and act on their suggestions. It is a very short survey. We don't want your opinion on everything from barbecue sauce recipes to brain surgery. We need only a few lines about what bothers you the most with MySQL and how we can help. And of course, if you wamt to help, we really want to hear about that.

Questions for the keynote

Let's ask Oracle directly. What are the biggest issues that you have? You can submit your questions for the keynote, and Tomas Ulin will answer them on stage at the MySQL conference
Here is your shot at practicing people power: think of the most pressing questions that you would see Tomas Ulin addressing, and submit them as soon as possible.

Talk to a council member

Council members are all well known people in the community. Most of us will be at the MySQL Conference or at Collaborate 11, or both.
If you want to ask me questions about the council, or MySQL, or community matters, I will do my best to answer them, or to seek an answer if I don't know it myself.

I will be at the San Francisco MySQL User Group on April 7th.
Then I will be at the MySQL conference Monday and Tuesday. Oh, and there is the Community Dinner West on Monday evening!
On Wednesday, I will be at Collaborate 11.
The other council members will be around as well. If you don't feel like filling the survey and prefer person-to-person communication, come see us, and let's talk!

Open Database camp 2011 - Opportunities for sponsors, culture, and more

The Open Database Camp 2011 is barely one month away.

Sponsorship

Many thanks to all the sponsors! We very much appreciate your support.
Speaking of what, here is some important information for sponsors: The venue owners, Sardegna Ricerche, has given us the availabilkity of an ample hall for sponsors, where they can showcase their products and services.
Each sponsor will have a desk, and a double panel sized cm 195 x 75 (6.3 x 2.4 feet).

Culture, fun, and more

The Science park is something unique that geeks may want to visit. It is one of the biggest research centers in Europe, and the owners have graciously organized a guided tour before and after the conference.
Near the conference there is Nora, an archeological site that alone is worth the trip for a visit. You can see it during the welcome party on Friday (if you show up before sunset, that is), or you can visit on your own after the conference.

To give you an idea of what expects you, here is a promotional video of Sardegna Ricerche.

The event should attract many local open source enthusiasts, with varying degrees of knowledge about open database. To meet their curiosity, there will be a parallel beginners track, with introductory sessions to open databases. SQL and noSQL fans, get ready to evangelize your beloved products. There will be many people eager to listen!

The Data Charmer