The Data Charmer: December 2011

Monday, December 19, 2011

Testing new builds with MySQL-Sandbox 3.0.24

MySQL::Sandbox 3.0.24 was released yesterday, with many new features.

More than vanilla MySQL

If you have missed my previous announcement, here's the gist of it. MySQL Sandbox can now deal with tarballs from either Percona Server or MariaDB. The main difference after this change is that you can now create a directory called <PREFIX>5.5.16 and make_sandbox will recognize it as well as the plain 5.5.16.

$ make_sandbox --export_binaries --add_prefix=ps \
   Percona-Server-5.5.11-rel20.2-114.Darwin.i386.tar.gz  \
   -- --sandbox_directory=msb_ps5_5_11
unpacking Percona-Server-5.5.11-rel20.2-114.Darwin.i386.tar.gz
[…]
installing with the following parameters:
upper_directory                = /Users/gmax/sandboxes
sandbox_directory              = msb_ps5_5_11
[…]
basedir                        = $HOME/opt/mysql/ps5.5.11
tmpdir                         = 
[…]
Your sandbox server was installed in $HOME/sandboxes/msb_ps5_5_11

After the binary export, subsequent installations will be easier:

$ make_sandbox ps5.5.11

The same commands can be used for MariaDB. At the moment, make_sandbox does not recognize other packages, but adding them should not be a big deal, provided that such packages look like MySQL. It wouldn't work with Drizzle, because it lacks the main ingredients for MySQL installation.

High Performance sandboxes

While testing parallel replication and prefetch slaves with Tungsten Replicator, I realized that I was doing too much manual fiddling with my scripts. Since I need more performant servers, I added the basic items that I need to modify to enable a faster server. Now, using the '--high_performance' option with make_sandbox, you get a server that is much better than out-of-the-box MySQL. To avoid problems with too much RAM, I am using a default of 512 MB for InnoDB, which is not enough for really demanding tests, but at least it is a good placeholder in the sandbox configuration file, should you need to modify it.

$ make_sandbox 5.1.60 -- --high_performance
[…]
innodb-flush-method=O_DIRECT ; \
innodb-log-file-size=50M ; \
innodb_buffer_pool_size=512M ; \
max_allowed_packet=48M ; \
max-connections=350 ; \
innodb-additional-mem-pool-size=50M ; \
innodb-log-buffer-size=50M ; sync_binlog=0 ; \
innodb-thread-concurrency=0 ; log-error=msandbox.err
[…]

Standalone masters and slaves

MySQL Sandbox has had the ability of creating replicated systems for years. Yet, sometimes you need a stand-alone master server that you want to use for some odd experiment. Similarly, you may want to create a slave of a specific master without having a full replication system. One case where you would like this ability is when you want to try replicating between servers of different versions.

$ make_sandbox 5.1.57 -- --master
[…]
my_clause                      = server-id=5157 ; log-bin=mysql-bin ; log-error=msandbox.err
[…]
Your sandbox server was installed in $HOME/sandboxes/msb_5_1_57

$ make_sandbox 5.5.10 -- --slaveof='master_port=5157' 
[…]
my_clause                      = server-id=5510 ; log-bin=mysql-bin ; log-error=msandbox.err
[…]
Your sandbox server was installed in $HOME/sandboxes/msb_5_5_10


$ ~/sandboxes/msb_5_1_57/use -e 'show master status'
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |      106 |              |                  |
+------------------+----------+--------------+------------------+

$ ~/sandboxes/msb_5_5_10/use -e 'show slave status\G'
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 127.0.0.1
                  Master_User: rsandbox
                  Master_Port: 5157
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 106
               Relay_Log_File: mysql_sandbox5510-relay-bin.000002
                Relay_Log_Pos: 252
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 106
              Relay_Log_Space: 420
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 5157

You can download MySQL::Sandbox from either launchpad or CPAN.

Saturday, December 10, 2011

How to submit a good database bug report

When an open source project becomes popular, bug reports start flocking in. This is both good and bad news for the project developers. The good news is that someone is using the product, and they are finding ways of breaking it that we didn't think of. The bad news is that most of the times the reporters assume that the developers have super human powers, and that they will find what's wrong by the simple mentioning that a given feature is not working as expected. Unfortunately, it doesn't work that way. An effective bug report should have enough information that the ones in charge will be able to reproduce it and examine in lab conditions to find the problem. When dealing with databases and database tools, there are several cases, from simple to complex. Let's cover them in order.

Installation issues

This is often a straightforward case of lack of functionality. When a tool does not install what it is supposed to, it is a show stopper, and a solution must be found. In this case, it's often enough to state the environment where the installation is happening (operating system, version of the tool, version of core components such as the MySQL database used) and the command used to start the installation. The error message could be an expected failure, when the installation procedure checks for requirements and fails if they are not met. For example: "Missing Ruby interpreter". The message tells (or suggests) you what to do. Filing a bug report on an expected failure is a waste of time. You should install the missing part and try again. Even if the message is about an unexpected failure (e.g. you get a stack trace from Ruby or Java), usually the first message tells you enough to be able to find a workaround. For example, if you get an exception from Ruby complaining about a missing 'curl' command, you can file a bug report to ask for the installer to check for 'curl' in the requirements, but if you install 'curl' yourself, the installation should continue.

Simple database issues

Reporting a database bug means complaining that the DBMS is not behaving as advertising by the documentation, or as common usage dictates. If it is a missing or misbehaving functionality, the best way of showing the problem is by starting with an empty DBMS, followed by the creation of the objects needed to reproduce the issue (CREATE SCHEMA, TABLE, INDEX) and by a minimal amount of data that triggers the problem. Some information about what operating system and database version was used is probably necessary to reproduce the problem consistently.

Simple database replication issues

By simple replication we mean a vanilla master/slave topology. In this scheme, data inserted in the master will eventually end up in the slave. Bugs in this category may fail to replicate the data totally or partially, or they may cause a break in the replication flow. Reproducing them is almost as easy as with simple database bugs. If you start with an empty system and manage to reproduce the error with a short sequence of commands, it should probably reproducible by a third party. Sometimes, settings in the master and the slave are essential to reproduce the problem. In MySQL, the format of binary logs, the default database engine and SQL modes can affect replication and produce different results with the same stream of SQL commands.

Complex database replication issues

The most difficult bugs to report are the ones where the error shows up only in a given topology. While MySQL native replication offers only few options to pipe data around (single, circular, hierarchical), Tungsten replicator allows a rich set of combined pipelines that can change the outcome of a data change event, depending on the originating node and the direction it took. In these case, information o how the cluster was installed becomes essential.

Concurrency issues

This is one of the most difficult bugs to report. When an error happens only because of the contemporary action of two or more threads, there is no easy way of reporting it in a way that it can be easily reproduced. Three methods are possible:

Describe the action of the first thread, then mark the change of thread and describe the actions of the second thread, continuing in this way until you reach the error point.
If you are a developer and feel comfortable with multi thread applications, write a script that reproduces the error by running several threads (Perl, Python, and Ruby offer the best environment for this kind of tests).
If the database offers a tool to write such multi-threading tests, consider using it.

Heavy load issues

This is a more complex case of the above one. Not only you need concurrency, but a lot of it happening at the same time. Reproducing this kind or error is challenging. If you have a support agreement with the provider of the database or the tool, you may let the support engineer have a look at your running environment, to find some clues. But even in this case, the support engineers or yourself need to ultimately reproduce the case in such a way that a developer can fix the problem and test the fix. There are two methods to report this problem:

Simplification: if you can reduce the concurrency to the elements that are misbehaving, the methods for concurrency issues will apply also in this case.
Enabling a query log could lead to identifying the sequence of events that have generated the error. The log should be integrated by the DDL of the objects involved in the action (schemas, tables, triggers, views, etc).

Large data issues

If your error only shows itself with large data, there is often a logistical problem, as you can't easily provide gigabytes of data, even if there privacy and security issues weren't in the way (which usually are). There are three strategies that you can use to report such bugs:

If only the size matters, then describing the kind of data used could be enough. E.g. When a table with such fields and such indexes grows beyond X GB, then the optimizer warp drive explodes. (Don't try this at home)
Create a script to generate the data that will ultimately trigger the error. This method requires both skills and an understanding of what the error is.
You may use a publicly available database to reproduce the error (for example, the Sample database with test suite.) Just mention in the bug report where to find the database, eventually how to load it if it is not simple, and then describe the steps needed to reproduce the bug after loading it.

DOs and DONTs

DO
- Search the bug repository and the mailing lists (or forums) before submitting yours. Someone may have had the same problem before you did. (Thanks, Robert Hodges, for this important reminder.)
- Put yourself in the receiver's position, and try to reproduce the problem from a clean initial state.
- If there is a workaround, mention it: it might give a good clue to the developers.
- More information is better. Anything that can improve the identification of the bug root cause will be welcome. (But don't overdo: see below)
- If you feel that a missing feature should be useful, report it as a feature request. (Even better: suggest a patch)

DON'T
- Don't report a missing feature as a bug, unless the docs say that the feature should be there.
- Don't just send the error message without the events that generated it.
- Don't include SQL commands embedded in code.
- Don't say "my application doesn't work anymore," assuming it's the database's fault. Remember The First Rule of Programming: It's Always Your Fault
- Less is better. If there is a long way and a short way of reproducing the bug, the shorter one is better. Don't send more info just for the sake of it.
- Don't tell the developers that they are retarded. This will not increase the priority given to your bug, or your credibility.

Friday, December 02, 2011

Call for participation for MySQL events - MySQL conference and FOSDEM

It's that time of the year where MySQL would be speakers are called to action. As usual, the main event is the MySQL spring conference, this year hosted by Percona. The call for participation to the MySQL Conference And Expo 2012 is still open until December 5th. To submit a proposal, you should register as a speaker and then fill in the form.
There is a conference committee which is already busy evaluating the proposals that have been submitted so far. The committee is demanding (I know for a fact, since I am in it!) and therefore, if you want to submit something, be very critical with yourself and polish your proposal as if your job depended on it!
Please read an update on Percona Live MySQL Conference & Expo 2012.

Speakers in Europe have some more duty, though. This is also the time to submit talks for FOSDEM MySQL DevRoom. Thanks to Frédéric Descamps and Sergey Petrunia, we have Room H.1309 with 150 seats on Sunday 5th February 2012, all day. The deadline to submit a talk proposal is December 26th. There is no review committee. Like we did on past editions, as soon as the talks are submitted, we will ask everyone to vote on the talks via Twitter or email. More updates will come soon. Of course, participation to FOSDEM DevRoom is not limited to European speakers. There have been several brave speakers who have willingly crossed the pond to offer their services at European conferences before, and they are welcome to repeat the experience. Submit your talk proposals now.

Thursday, December 01, 2011

Never say "there is no way"

Reading a recent MySQL book, I saw an example of SHOW CREATE TABLE that comes with backticks (`) around the table and column names, and a comment:

Unfortunately, there is no way to remove this from generated syntax with this command.

(Emphasis mine).
Here's how it goes:

mysql> show create table mytest\G
*************************** 1. row ***************************
       Table: mytest
Create Table: CREATE TABLE `mytest` (
  `id` int(11) NOT NULL,
  `description` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Of course, there is a way!

mysql> pager tr -d '`'
PAGER set to 'tr -d '`''
mysql> show create table mytest\G
*************************** 1. row ***************************
       Table: mytest
Create Table: CREATE TABLE mytest (
  id int(11) NOT NULL,
  description varchar(50) DEFAULT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Never say "there is no way!"

Instead of "tr -d '`'", you can use "sed -e 's/`//g'", which does the same thing.

If you are running the query at the command line, you may use the pipe directly:

$ mysql -e 'show create table test.mytest\G' | tr -d '`'
*************************** 1. row ***************************
       Table: mytest
Create Table: CREATE TABLE mytest (
  id int(11) NOT NULL,
  description varchar(50) DEFAULT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

The Data Charmer