Friday, June 29, 2007

MySQL as universal server

With all the discussion going on about MySQL Proxy, we may forget that there are alternatives.
Today in the MySQL Dev Zone there is an article about a different proxy. It's more than that, actually. It is a Perl Module implementing the MySQL client/server protocol.
With this tool, you can emulate some of MySQL Proxy features, and you can do something more. For example, you can use Federated tables with non-MySQL database servers. The article features an example where a table from SQLite and a table from PostgreSQL are queried as if they were normal MySQL tables.

Wednesday, June 27, 2007

Your first macros with MySQL Proxy

You may have noticed the excitement about MySQL Proxy. But perhaps you failed to realize what you can do with it.
Here's a simple tutorial that will guide you through your first Proxy experience. You will be able to use macros with the standard mysql client.

Assuming that you are working on the box where your MySQL server is, do the following:
  1. Download and unpack MySQL Proxy
  2. download the query rewriting tutorial and save it as rewrite.lua
  3. start the proxy
    /path/to/sbin/mysql-proxy --proxy-lua-script=rewrite.lua --proxy-profiling
  4. from another terminal, fire the mysql client
    mysql -u USERNAME -pPASSWORD -h -P 4040
Now you can use your client as usual, with a few additions. The rewrite.lua script implements three macros (cd, ls, who, corresponding respectively to USE, SHOW FULL TABLES, and SHOW PROCESSLIST).
mysql> cd test;
Query OK, 0 rows affected (0.00 sec)

mysql> select schema();
| schema() |
| test |
1 row in set (0.00 sec)

mysql> cd mysql;
Query OK, 0 rows affected (0.00 sec)

mysql> select schema();
| schema() |
| mysql |
1 row in set (0.00 sec)
Look at that! We send a macro (cd test) to the server. The proxy catches it, and turns it into "use test". The server executes it, unaware of the manipulation in between. Let's try the others:
mysql> ls mysql;
| Tables_in_mysql |
| columns_priv |
| db |
| func |
| help_category |
| help_keyword |
| help_relation |
| help_topic |
| host |
| proc |
| procs_priv |
| tables_priv |
| time_zone |
| time_zone_leap_second |
| time_zone_name |
| time_zone_transition |
| time_zone_transition_type |
| user |
17 rows in set (0.00 sec)

mysql> who;
| Id | User | Host | db | Command | Time | State | Info |
| 4 | datacharmer | localhost | bugs | Sleep | 1522 | | NULL |
| 11 | gmax | localhost:50630 | mysql | Query | 0 | NULL | SHOW PROCESSLIST |
2 rows in set (0.00 sec)
And now we add our own macro to the list. Stop the proxy. Edit rewrite.lua, and just before the line saying
elseif command == "who" then
add these three lines, implementing a info macro, which corresponds to "SELECT USER(), NOW()":
                elseif command == "info" then
proxy.queries:append(1, string.char(proxy.COM_QUERY) .. "SELECT USER(), NOW()" )
return proxy.PROXY_SEND_QUERY
Restart the proxy, and try your new macro:
mysql> info;
| USER() | NOW() |
| gmax@localhost | 2007-06-27 10:24:00 |
1 row in set (0.00 sec)
Isn't it cool? And that's just the beginning.

Tuesday, June 26, 2007

MySQL Proxy - An excellent excuse to learn a new language

If you are as curious as I am about technology, you would have been intrigued when the MySQL proxy entered the radar. So I followed this promising jewel from its first inception as source only (and for internal distribution) until its release with a full series of binaries.

The recent announcement has made it official. MySQL Proxy is a full fledged supported product, and it's ready to use.
You just have to get the binaries and use it.
If you feel like experimenting (and there is no reason to get the proxy if you aren't) you can have a look at the sample lua scripts available on the Forge and play with them. You may notice that there is one snippet that I created. If you are thinking of calling me for lua lessons, think again. When I wrote that example, my entire production with lua amounted to three scripts, and the total length of my lua experience could be measured in hours.
Having used at least a dozen languages (including the usual suspects Pascal, C, C++, and Perl, but also things like Sather), a new scripting language does not scare me. I believe that I am going to see more of this lua, and I believe that lots of MySQL users are going to become proficient lua speakers in the near future.
Good job, Jan!

Users, give the proxy a try. And if you can't figure out what to do with it, stay tuned. I have a couple of article on this subject, and not only about this proxy, coming soon.

Wednesday, June 13, 2007

Test suites for the masses


Test suites are hard, but your business needs to have one.
They are hard because modern applications are complex.
Technology can help you. There are tools that can simplify the task of creating a thorough test suite.

Does your main business have a regression test?

The likely answers to the above questions are:
  • Yes. - Good chap. Keep up the good work. You may stop reading, but just out of curiosity, read on.
  • No. I don't need one. - You may stop reading. Most likely your business is already doomed. :)
  • No. I don't have time (don't know how) to make one. - Keep reading. There is something for you.
What's a regression test? Is a useful watchdog that tells you if your software is working the way it was intended to.
If you don't have one, you have no quick and reliable way of telling if your application is safe.
If you think that you don't need a regression test suite, either you have found the formula of making perfect software, or your optimism is way off mark.

Tests are hard

If you have a test suite, you know that it requires hard work, dedication, cunning ways of thinking, technical abilities.
If your business includes a MySQL database, you may consider using its specific test framework, which introduces yet another level of difficulty.
MySQL testing framework is designed to allow testing any database scheme (stand alone, replicated, clusters) with just one box, at the price of some language addition that you would need to learn.
Writing your own regression test that covers your application means replaying the data structure and the queries, to make sure that, if you alter the working environment, your application continues working as required.

The most likely scenario occurs when you need to upgrade. Your current server has a bug that was just fixed in a more recent one. You would like to upgrade, but before doing that you would like to be sure that the new environment is capable of running your queries without side effects.
A regression test would solve the problem. You set up a temporary server, install the upgraded software and run the test suite. If you don't get errors, you may confidently upgrade. It depends on how thorough is your test, of course. The more you cover, the more you can trust it.
If you don't have a regression test and you need to upgrade, you are in trouble. If there are inconsistencies between your application and the new version, you will find out when a user complains about a failure. Which is not the most desirable course of action.

Understanding the problem

Why making a test suite for an application is so hard? Mostly because you don't usually have a list of all queries running against your server. Even if you planned a modular application and all your database calls are grouped in one easy accessible part of your code, that won't solve your problem. Modern applications don't use static queries. Most queries are created on the fly by your application server, twisting a template and adding parameters from users input.
And even if your templates are really well planned, so that they leave nothing to imagination, you will have trouble figuring out what happens when many requests access your database concurrently.
Smarter applications can log all database requests and leave a trail that you can use to create your test. But more complex systems count more than one application, and taking trace of all their queries is a real headache.

Making tests easier

Fortunately, there is a central place where all queries go. Your server.
Unfortunately, the server does not create a test case for you. It just records all calls in the general log, and it that is not suitable for further consumption.
Here comes the technology I mentioned earlier. You can make a test script out of your general log.
  1. activate the general log;
  2. Download the test maker;
  3. Inspect the general log and identify the likely period when the most important queries occur;
  4. tell the test maker to read the above traffic (which you can identify by line number or by timestamp)
  5. edit the resulting test and see if covers your needs.

The test maker is a collection of two tools. One that can get a snapshot of your data (like mysqldump, but with a LIMIT clause, so that you can take only a sample of your data) and the test maker itself, which reads the general log and creates a script. You can tune this tool with several options.

Log Analyzer and test maker, version 1.0.2 [2007-06-10]
(C) 2007 Giuseppe Maxia, MySQL AB
Syntax: ./testing_tools/ [options] log_file test_file
--add_disable_warnings|w Add disable warnings before DROP commands. (default: 0))
--from_timestamp|t = name Starts analyzing from given timestamp. (default: undefined))
--to_timestamp|T = name Stops analyzing at given timestamp. (default: undefined))
--from_line|l = number Starts analyzing at given line. (default: first))
--to_line|L = number Stops analyzing at given line. (default: last))
--from_statement|s = name Starts analyzing at given statement. (default: first))
--to_statement|S = name Stops analyzing at given statement. (default: last))
--dbs = name Comma separate list of databases to parse. (default: all))
--users = name Comma separate list of users to parse. (default: all))
--connections = name Comma separate list of connection IDs to parse. (default: all))
--verbose Adds more info to the test . (default: 0 ))
--help Display this help . (default: 0 ))
As you can see from the options, you can filter the log by several combinations of connection ID, users list, timestamps, line numbers.
The resulting script is suitable for the MySQL test framework.


The test maker is not a panacea, but it helps reducing your test suite development time. Running this tool you will get a passable starting test, which you can then refine to make it closer to your needs.
The main advantage is that you can have a set of instructions that were run against your production server, and you can replay against a testing server when you need to upgrade.
If some features of MySQL 5.1 look appealing and you plan to upgrade, then a test suite is the best way of making sure that the switch will be painless.

More than that, once you have a test suite, consider joining the Community Testing and Benchmarking Program, a project to share test suites from the Community, so that your test case becomes part of MySQL AB suite. Thus, you won't be alone with your upgrade problems. If there are any regression failures, they will be addressed before the next version is released, for the mutual benefit of company and users.

Known issues

There are problems, of course.
The general log is not turned on by default. To switch it on, you need to restart the server, and to switch it off you will do that again (until you upgrade to MySQL 5.1, which has logs on demand).
Moreover, the general log can become huge, so you need additional maintenance, to avoid filling the whole disk with logs.
Furthermore, the general log will record everything sent to the server, even wrong queries. While this is desirable for a test case, because testing failures is as important as testing compliance, you will need to adjust manually the test script, adding "--error" directives.
Again, the test maker records only the test script, but not the result, which you will need to create using the testing framework.
All in all, having a tool writing a big chunk of the test script is much better than doing it all manually. So even if this is not a complete solution, it's a hell of a good start.
Comments welcome.

Tuesday, June 12, 2007

Use 5.1 partitions in production - Today!

MySQL 5.1, the next major version, has a bunch of juicy features that many people would like to use in production, if only it were already GA. Partitioning for example, is one of those features that all users want to get their hands on. It is about performance. Partitioning can reduce response times dramatically, if applied correctly. I don't need to convince you of this. You can read one of the several articles about the technicalities of partitions (there is one piece about the performance of partitioned archive tables in this very blog).

The only problem is that MySQL 5.1 is still in beta. The number of outstanding bugs is shrinking by the day, and the number of reported bugs is receding, but nonetheless, it's not a GA release, and therefore the conscientious DBA is (justly) reluctant at the idea of using it in production.
Those partitions, though, are really tempting. Perhaps we can play some trick to use them, while keeping our data safe.
There are ways where you could use MySQL 5.1 in production, especially partitions, without endangering your valuable data. The easiest one is with replication.

MySQL replication is a cheap, reliable, fast way of synchronizing data among servers and balancing the query load, if you are willing to split your application logic so that you write on one server (the master) and read from another one, or several ones (the slave).

One nice feature of MySQL replication is that it can work across versions, as long as the most recent one is used in the slave. So you can set your real production server as the master, running MySQL 5.0, the legitimate GA release, while the slave will run MySQL 5.1.

The steps to achieve this feat are simple:
  • make sure that your master is using a binary log and has a unique server ID;
  • create a user with REPLICATION SLAVE privileges;
  • on the slave, install MySQL 5.1, with a different server ID, but don't start the replication yet;
  • dump the data from master to slave, using the mysqldump options --master-data=1 and --single-transaction;
  • On the slave, Alter the structure of the tables that you want to partition;
  • start the slave.
Now you can instruct your application to send expensive queries to the slave, instead of the master. Now your most demanding SELECT statements can take advantage of partitioning, while your data is safely held in the GA master.

Think of the advantages:
  • You will speed up some of your expensive queries. (Beware. It is not always easy. You need to check your table design, but in the end, it works spectacularly);
  • If you find a bug in 5.1, you can help the development without risking the integrity of your data, which is still safe inside the master.
If you still have doubts, I'll let you in a little secret. At MySQL, at least two production databases have been running on 5.1 for one year already. We trust it, even though we can't officially tell you to do the same. But I can recommend you the little replication trick that will let you use the new features with no risks.