Wednesday, January 20, 2010

Multi dimensional cubes in MySQL through Gearman


MySQL cubes with Gearman

I gave two presentations about Gearman at the Linux.conf.au. As part of the preparation for these talks, I created several sample applications. One of them, about remote replication administration, I will cover in a separate post. The most amazing one, which I cover here, is a quick and painless solution for multiple level crosstabs in MySQL.

Some background is needed. Crosstabs (also called data cubes or pivot tables, have been one of my favorite hacks for long time. In 2001 I wrote an article about a simple way of doing single level crosstabs. A few years later, I developed a Perl module that generates multiple levels of data cubes in most any database systems. Since then, I have received countless requests to convert this module to PHP, Python, Java, and I have always declined, for lack of time or abilities.
In the coming years, I tackled the same problem using MySQL Proxy and some SQL hacks. Both attempts were not completely satisfactory. The options offered by the Perl module are simply too hard to replicate to any other system.
When I started using Gearman, I realized that I could use the original Perl module through a Gearman worker, without converting to any other language. The idea is to write a simple worker that accepts some parameters and runs the Perl module to return a crosstab query to the client. The query being the most complicated thing to generate, the architecture could look like the image below.

To take the idea one step further, I used the Gearman UDF for MySQL, which makes the crosstab function available at the SQL level, thus being transparent no matter which programming language the client uses, and without need of using the Gearman API.

In this scenario, what you need to do is just querying the worker (through the UDF), with a simple string of parameters.

mysql> set @q = (select gman_do('crosstab',
'from=all_personnel;op=sum salary;rows=country;cols=gender'));

mysql> prepare q from @q; execute q;
+---------+-------+-------+-------+
| country | m | f | total |
+---------+-------+-------+-------+
| Germany | 16000 | 11000 | 27000 |
| Italy | 6000 | 6000 | 12000 |
| UK | 10500 | NULL | 10500 |
| zzzz | 32500 | 17000 | 49500 |
+---------+-------+-------+-------+

Here's a taste of a 2 levels cube:
set @q = (select gman_do('crosstab','from=all_personnel;op=count salary;rows=country,location;cols=department,gender'));
Query OK, 0 rows affected (0.03 sec)

prepare q from @q; execute q;Query OK, 0 rows affected (0.00 sec)
Statement prepared

+---------+----------+--------+--------+------+---------+---------+-------+-------+-------+-----+-------+
| country | location | pers#m | pers#f | pers | sales#m | sales#f | sales | dev#m | dev#f | dev | total |
+---------+----------+--------+--------+------+---------+---------+-------+-------+-------+-----+-------+
| Germany | Berlin | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 2 |
| Germany | Bonn | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| Germany | Munich | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 2 |
| Germany | zzzz | 1 | 1 | 2 | 1 | 1 | 2 | 1 | 0 | 1 | 5 |
| Italy | Rome | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 2 |
| Italy | zzzz | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 2 |
| UK | London | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 2 |
| UK | zzzz | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 2 |
| zzzz | zzzz | 3 | 1 | 4 | 2 | 1 | 3 | 1 | 1 | 2 | 9 |
+---------+----------+--------+--------+------+---------+---------+-------+-------+-------+-----+-------+

It would be nice to actually format the result in a more human readable way, like this one, but it will require some more work.

+---------+----------+--------------+-----------------+-------------+-------+
| country | location | pers |sales | dev | total |
| | +---+---+------+----+----+-------+---+---+-----+-------+
| | | m | f | pers | m | f | sales | m | f | dev | total |
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+
| Germany | Berlin | 0 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 2 |
| Germany | Bonn | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 |
| Germany | Munich | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 2 |
| Germany | total | 1 | 1 | 2 | 1 | 1 | 2 | 1 | 0 | 1 | 5 |
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+
| Italy | Rome | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 2 |
| Italy | total | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 2 |
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+
| UK | London | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 2 |
| UK | total | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 2 |
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+
| total | total | 3 | 1 | 4 | 2 | 1 | 3 | 1 | 1 | 2 | 9 |
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+

To make the above examples work, what's missing is the worker. You can try the sample crosstab worker from MySQL Forge.

Wednesday, January 13, 2010

MySQL user group meeting in Sydney, January 15th


Sydney MySQL User Group

On January 15th I will be in Sydney, Au, on my way to Wellington, New Zealand.
I will be at the MySQL User Group, hosted at the Sydney Mechanics’ School of Arts at 5.30pm.
I will talk about testing complex database systems with MySQL Sandbox.
The meeting is open to all. If you want to attend, please register at the Sydney MySQL User Group meetup page.

Monday, January 11, 2010

MySQL user group meeting in Dubai, January 13th


Dubai MySQL meetup

On January 13th I will be in Dubai, UAE, on my way to Wellington, New Zealand.
I will be at the MySQL User Group, hosted at Sun Microsystems offices at 6pm.
I will talk about boosting performance with MySQL 5.1 partitions, covering the recent 5.5. additions.
The meeting is open to all. If you want to attend, please register at the meetup page.

Tuesday, January 05, 2010

Speaking, speaking, speaking: Dubai-Sydney-Wellington



(*)

From January 12th to 27th I will be traveling to the Southern Hemisphere and speaking at two user groups and two conferences.
The schedule (see below) is almost scary. I will be talking about Partitioning (Dubai and Wellington), MySQL Sandbox (Sydney and Wellington), Gearman (Wellington), and some general topics now and then.

The complete schedule and location follows:

(*) I know. The world map is upside down. That is how you would see it if people in the Southern Hemisphere had started drawing maps before the ones in the North.