The Data Charmer

Tuesday, February 16, 2010

Linux MySQL distros meeting in Brussels

When I saw Shlomi's post on why not to use apt-get or yum for MySQL, I thought immediately that his conclusions are quite reasonable. What you get from the Linux distributions is not the same thing that you find in the official MySQL downloads page. Now, whether you value more the completeness of the server or the ease of administration through the distribution installation tools, it's up to you and your business goals. We at the MySQL team have organized a meeting with the Linux distributions with the intent of finding out which differences and problems we may have with each other, and to solve them by improving communication. What follows is a summary of what happened in Brussels during the meeting.

Summary

Linux distributions ship MySQL products (Server, GUI Tools, connectors, Cluster) with different criteria and different grade of maturity, according to their own goals.
Due to lack of communication and policy conflicts, the distros almost always ship outdated versions of MySQL server and MySQL Cluster. The lag between the shipped version and the latest product shipped by MySQL ranges from a few months to several years.
By mutual understanding, the distros will now try to ship recent versions of Cluster (7.x) in a separate package.

Participants

(see some more pictures from the meeting).

Giuseppe Maxia, MySQL Community Team Lead, Italy
Tomas Ulin, MySQL VP Engineering, Sweden
Harmut Holzgraefe, MySQL Support, Germany
Lars Heill, MySQL Build, Norway (Trondheim)
Joro Kodinov, MySQL Engineering / 5.1, Bulgaria
Oden Eriksson, Mandriva, Sweden
Mathias Gug, Canonical/Ubuntu, Canada
Robin H. Johnson, Gentoo Linux, Canada
Michal Hrušecký, Novell/openSUSE, Czech Republic
Geir Høydalsvik, MySQL QA, Norway (Trondheim)
Norbert Tretkowski, Debian Linux, Germany
Kaj Arnö, MySQL VP Community Relations, Germany

Main issues from the distributions:

d1. Security bugs are invisible until MySQL releases a fix. They would like to get visibility of the bug report, to become aware of the problem and eventually help fixing it. We are looking into this matter.

d2. Due to lack of communication, the distros were running the test suite with different parameters. Gentoo packages the server with UTF-8 as default character set, and this causes several tests to fail. Our QA team is looking into it.

d3. Bug databases are different for each distros. They usually solve problems on their own, or send the issue upstream (to the MySQL team at Sun, now Oracle) when it is a legitimate bug.

d4. Debian and Ubuntu don't apply all our patches to the server that they ship. They only apply security bugs and fix for bugs that don't introduce new or changed functionality. This is, IMO, mostly a matter of terminology, since the new functionality is only added as a side effect of fixing a bug. For example, when we fixed Bug#49222: Mark RAND() as unsafe, there is a change in functionality. Now RAND() is logged in ROW format, as it should have been in the first place. It is indeed a new functionality, but as a user I would rather have this bug fix in my server, than adhering to the strict rules of no changes.

d5. GUI tools are still shipped as current although they aren't actively supported, with patches provided by OpenSuse.

d6. While we provide specifications for .rpm packages, we don't do that for .deb ones. Debian/Ubuntu ask if we can include them in our code.

Main issues from the MySQL team:

m1. Cluster packages are outdated. Mainly for miscommunication, some distros are building the cluster binaries with the server package, thus shipping quite old and non-functional cluster binaries. After an explanation on the Cluster roadmap, the distros agreed to ship 7.x binaries from now on. We agreed that we will modify the build scripts in the server to avoid compiling the cluster binaries unintentionally.

m2. MySQL Workbench is not included in the stable releases. There are two reasons: it is not GA yet, and its source includes non-GPL code (for Windows and Mac) that needs to be removed before being used by Debian and derivatives. Moreover, Debian communicates that some DBAs don't like the idea of deploying a design tool for daily database administration.

Conclusions

All in all, I feel that this meeting was a success. We achieved a lot during the proceedings, solving problems ranging from simple communication mismatches to neglected bugs. And meeting in person with the ones who deal actively with MySQL in the Linux distros is quite a rewarding experience.
From a technical standpoint, I hadn't realized that every distribution is shipping a different server. That is quite a challenge for the common users who may need to choose between versions in several sites. However, this meeting has also shown me that all the participants have very high quality standards, and the difference in shipped versions is mostly due to the peculiarity of many shipping calendars.
Thanks to all the participants. We fixed many issues, and we had lots of fun at the same time. We should do that more often!

Friday, February 05, 2010

MySQL Developers Room at FOSDEM 2010

I'm going to FOSDEM, the Free and Open Source Software Developers' European Meeting

I am in Brussels, waiting to attend FOSDEM 2010, one of the biggest open source gatherings in Europe, taking place this weekend in Brussels.
On Sunday, there is a Developers Room for MySQL and Friends, with 14 talks from open source professionals coming from Europe and North America.

The novelty of this round of talks is that thy will be 20 minutes long, rather than 1 hour. This will force all presenters to be more cautious about their timing, and to concentrate their talks on the essential. Even the experienced ones, who have given the same talk several times, will have to make an effort to come to the point in less time. The idea cam from reading Scott Berkun's book, Confessions of a public speaker, where he argues successfully on the usefulness of short lectures. It's going to be interesting!
The hosts of the Developers Room are Ronald Bradford and myself, but nothing of this could have happened without the excellent preparatory work done by Lenz Grimmer, who can't be here to enjoy the results of his organization, because he must stay home, waiting for his second child to come any moment. Thanks, Lenz, and good luck!
In addition to the DevRoom, I will have a lightning talk on an unusual (for my public speaking record) topic: Blaming the unknown: a positive approach to technology. If you happen to be around, come see it. It's fun, I promise you, and also informative, or so I hope.

Wednesday, January 20, 2010

Multi dimensional cubes in MySQL through Gearman

I gave two presentations about Gearman at the Linux.conf.au. As part of the preparation for these talks, I created several sample applications. One of them, about remote replication administration, I will cover in a separate post. The most amazing one, which I cover here, is a quick and painless solution for multiple level crosstabs in MySQL.

Some background is needed. Crosstabs (also called data cubes or pivot tables, have been one of my favorite hacks for long time. In 2001 I wrote an article about a simple way of doing single level crosstabs. A few years later, I developed a Perl module that generates multiple levels of data cubes in most any database systems. Since then, I have received countless requests to convert this module to PHP, Python, Java, and I have always declined, for lack of time or abilities.
In the coming years, I tackled the same problem using MySQL Proxy and some SQL hacks. Both attempts were not completely satisfactory. The options offered by the Perl module are simply too hard to replicate to any other system.
When I started using Gearman, I realized that I could use the original Perl module through a Gearman worker, without converting to any other language. The idea is to write a simple worker that accepts some parameters and runs the Perl module to return a crosstab query to the client. The query being the most complicated thing to generate, the architecture could look like the image below.

To take the idea one step further, I used the Gearman UDF for MySQL, which makes the crosstab function available at the SQL level, thus being transparent no matter which programming language the client uses, and without need of using the Gearman API.

In this scenario, what you need to do is just querying the worker (through the UDF), with a simple string of parameters.


mysql> set @q = (select gman_do('crosstab', 
'from=all_personnel;op=sum salary;rows=country;cols=gender'));

mysql> prepare q from @q; execute q;
+---------+-------+-------+-------+
| country | m     | f     | total |
+---------+-------+-------+-------+
| Germany | 16000 | 11000 | 27000 |
| Italy   |  6000 |  6000 | 12000 |
| UK      | 10500 |  NULL | 10500 |
| zzzz    | 32500 | 17000 | 49500 |
+---------+-------+-------+-------+

Here's a taste of a 2 levels cube:

set @q = (select gman_do('crosstab','from=all_personnel;op=count salary;rows=country,location;cols=department,gender'));
Query OK, 0 rows affected (0.03 sec)

prepare q from @q; execute q;Query OK, 0 rows affected (0.00 sec)
Statement prepared

+---------+----------+--------+--------+------+---------+---------+-------+-------+-------+-----+-------+
| country | location | pers#m | pers#f | pers | sales#m | sales#f | sales | dev#m | dev#f | dev | total |
+---------+----------+--------+--------+------+---------+---------+-------+-------+-------+-----+-------+
| Germany | Berlin   |      0 |      1 |    1 |       0 |       0 |     0 |     1 |     0 |   1 |     2 |
| Germany | Bonn     |      0 |      0 |    0 |       1 |       0 |     1 |     0 |     0 |   0 |     1 |
| Germany | Munich   |      1 |      0 |    1 |       0 |       1 |     1 |     0 |     0 |   0 |     2 |
| Germany | zzzz     |      1 |      1 |    2 |       1 |       1 |     2 |     1 |     0 |   1 |     5 |
| Italy   | Rome     |      1 |      0 |    1 |       0 |       0 |     0 |     0 |     1 |   1 |     2 |
| Italy   | zzzz     |      1 |      0 |    1 |       0 |       0 |     0 |     0 |     1 |   1 |     2 |
| UK      | London   |      1 |      0 |    1 |       1 |       0 |     1 |     0 |     0 |   0 |     2 |
| UK      | zzzz     |      1 |      0 |    1 |       1 |       0 |     1 |     0 |     0 |   0 |     2 |
| zzzz    | zzzz     |      3 |      1 |    4 |       2 |       1 |     3 |     1 |     1 |   2 |     9 |
+---------+----------+--------+--------+------+---------+---------+-------+-------+-------+-----+-------+

It would be nice to actually format the result in a more human readable way, like this one, but it will require some more work.


+---------+----------+--------------+-----------------+-------------+-------+
| country | location | pers         |sales            | dev         | total |
|         |          +---+---+------+----+----+-------+---+---+-----+-------+
|         |          | m | f | pers |  m |  f | sales | m | f | dev | total |
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+
| Germany | Berlin   | 0 | 1 |    1 |  0 |  0 |     0 | 1 | 0 |   1 |     2 | 
| Germany | Bonn     | 0 | 0 |    0 |  1 |  0 |     1 | 0 | 0 |   0 |     1 | 
| Germany | Munich   | 1 | 0 |    1 |  0 |  1 |     1 | 0 | 0 |   0 |     2 | 
| Germany | total    | 1 | 1 |    2 |  1 |  1 |     2 | 1 | 0 |   1 |     5 | 
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+
| Italy   | Rome     | 1 | 0 |    1 |  0 |  0 |     0 | 0 | 1 |   1 |     2 | 
| Italy   | total    | 1 | 0 |    1 |  0 |  0 |     0 | 0 | 1 |   1 |     2 |
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+
| UK      | London   | 1 | 0 |    1 |  1 |  0 |     1 | 0 | 0 |   0 |     2 | 
| UK      | total    | 1 | 0 |    1 |  1 |  0 |     1 | 0 | 0 |   0 |     2 |
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+ 
| total   | total    | 3 | 1 |    4 |  2 |  1 |     3 | 1 | 1 |   2 |     9 | 
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+

To make the above examples work, what's missing is the worker. You can try the sample crosstab worker from MySQL Forge.

Wednesday, January 13, 2010

MySQL user group meeting in Sydney, January 15th

On January 15th I will be in Sydney, Au, on my way to Wellington, New Zealand.
I will be at the MySQL User Group, hosted at the Sydney Mechanics’ School of Arts at 5.30pm.
I will talk about testing complex database systems with MySQL Sandbox.
The meeting is open to all. If you want to attend, please register at the Sydney MySQL User Group meetup page.

Monday, January 11, 2010

MySQL user group meeting in Dubai, January 13th

On January 13th I will be in Dubai, UAE, on my way to Wellington, New Zealand.
I will be at the MySQL User Group, hosted at Sun Microsystems offices at 6pm.
I will talk about boosting performance with MySQL 5.1 partitions, covering the recent 5.5. additions.
The meeting is open to all. If you want to attend, please register at the meetup page.

Tuesday, January 05, 2010

Speaking, speaking, speaking: Dubai-Sydney-Wellington

(*)

From January 12th to 27th I will be traveling to the Southern Hemisphere and speaking at two user groups and two conferences.
The schedule (see below) is almost scary. I will be talking about Partitioning (Dubai and Wellington), MySQL Sandbox (Sydney and Wellington), Gearman (Wellington), and some general topics now and then.

The complete schedule and location follows:

January 13th Dubai MySQL User Group
January 15th Sydney MySQL User Group
January 18th Open Programming Languages miniconf
January 19th Data Storage and Retrieval miniconf
January 20th - 22nd LCA BoF
January 23th DrupalSouth
January 24thDrupalSouth

(*) I know. The world map is upside down. That is how you would see it if people in the Southern Hemisphere had started drawing maps before the ones in the North.

Tuesday, December 29, 2009

Filtering mysqldump output

Several people have suggested a more flexible approach at mysqldump output in matter of user privileges.
When you dump the data structure for views, triggers, and stored routines, you also dump the permissions related to such objects, with the DEFINER clause.
It would be nice to have such DEFINER clauses removed, or even replaced with the appropriate users in the new database.

The mysqldump filter was created with this need in mind. It allows you to remove all DEFINER clauses and eventually replacing them with a better one.
For example:


mysqldump --no-data sakila | dump_filter --delete > sakila_simple.sql
mysqldump --no-data sakila | dump_filter --replace='newuser@`10.%`' > sakila_secure.sql

The first example removes all references to DEFINER, while the second one replaces every definer with a new user name.

Update Since MySQL Forge is not available anymore, here's the code:


#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long qw(:config no_ignore_case );

my $replace = undef;
my $delete  = undef;
my $help    = 0;

GetOptions (
   'replace|r=s' => \$replace,
   'delete|d'    => \$delete,
   'help|h'      => \$help,
) or help('unrecognized options');

help() if $help;

if ($delete and $replace) {
    help( 'you must choose EITHER "delete" OR "replace". Not both');
}

$delete = 1 unless $replace;

while (my $line = ) {
    if ($delete) {
        $line =~ s{(/\*!\d+ )\s*definer\s*=\s*\S+}{$1}i;
    }
    elsif ($replace) {
        $line =~ s{(/\*!\d+\s+definer\s*=\s*)(\S+)}{$1 $replace}i;
    } 
    print $line;
}

sub help {
    my ($msg) = @_;
    if ($msg) {
        print "*** $msg\n";
    }
    print "dump_filter - mysqldump filter \n",
          "(C) Giuseppe Maxia, 2009\n",
          "removes/changes DEFINER clauses from MySQL dumps\n",
          "USAGE: dump_filter [options]\n",
          "    -d|--delete    removes the DEFINER clauses\n",
          "    -r|--replace=s replaces every DEFINER clause with the \n",
          "                   new value provided\n",
          "    -h|--help      This text\n";
    exit(1);
}


__END__
notes: MySQLdump filter.
Removes or replaces the DEFINER clauses from a dump.
It is necessary when you want to migrate a database to a new server with different users.

Thursday, December 24, 2009

Holiday gift - A deep look at MySQL 5.5 partitioning enhancements

Half a day into my vacation, I managed to finish an article on a topic that has been intriguing me for a while.
Since several colleagues were baffled by the semantics of the new enhancements of MySQL 5.5 partitions, after talking at length with the creator and the author of the manual pages, I produced this article: A deep look at MySQL 5.5 partitioning enhancements.
Happy holidays!

UPDATE This matter was more tricky than it appeared at first sight. As Bug#49861 shows, several MySQL engineers were initially fooled by the multiple column partitions. Also I wrote something wrong in the article, and I updated the text to explain more accurately the behavior of the partitioning engine.

Sunday, December 20, 2009

MySQL Conference 2010 - The call for participation is open

The MySQL Conference 2010, with Sun Microsystems as founding sponsor, has opened its Call for participation.
There is already an impressive lineup of tutorials, and I don't say that only because I am on that list. You will find the usual suspects (Replication, Cluster, Certification) and several new ones: Partitioning (covering 5.5), Drizzle replication plugins and core development, Dual master setup, Scaling Applications, Diagnosing and fixing performance, Inspecting variables, command line magic.

The names next to the above topics are all well known: Sheeri K. Cabral, Brian Aker, Jay Pipes, Toru Maesaka, Baron Schwartz, Andrew Hutchings, Andrew Morgan , Geert Vanderkelen, Arjen Lenz, Morgan Tocker, Kai Voigt, Alan Kasindorf, Padraig O'Sullivan, Mats Kindahl, Lars Thalmann, and yours truly.
If you want to join the ranks of these already confirmed folks, it's now time to gather all your skills and make a proposal for a talk.
The Call for Participation opens today, and it's open till the end of January. Not much time, but not a bad deal either. You have certainly enough time to pull an excellent proposal. If you don't remember the rules, you can read again what I wrote about the past conference. The review committee is different, but the current one cares about quality as much as the previous ones.
And consider how much better is it this year for innovative speakers: due to the delay in the conference announcement (on the reasons of which I am not going to speculate), you are in a unique position, since you can propose talks based on technology that has been released in the past two months. There is a whole lot of talks to be created around MySQL 5.5 and other fresh releases.
Don't be lazy. Start writing your proposal now, and you may be in the conference schedule by February!

Tuesday, December 15, 2009

Getting started with MySQL 5.5

Some time go, we announced a new release model for MySQL. As all new things, it had some initial hiccups (with MySQL 5.4 we were still getting acquainted with the new model), but now it seems to be in full swing.
By the time you read these lines, MySQL 5.5 will be available. If the mirrors aren't seeded yet, the impatient can compile and use the new version from the launchpad source tree..

Overview

What's this new release anyway? I'll leave it to Kaj's blog to give you the full description. Suffice it to say that this release is the second milestone of the current installment. It is of beta quality, and it will mature to RC quality. There will be yet another milestone before we release a GA in mid 2010.
One thing that this milestone shows is that there are no dead weights. If a feature doesn't make the deadline, i.e. it doesn't reach beta quality by the scheduled date, it will be dropped, and eventually rescued at the next milestone.
With the introduction of the milestone model, we have also increased our internal QA, especially thanks to the Random Query Generator, which finds bugs in early stages of the code faster than any other method. ⁽¹⁾

Built-in InnoDB plugin

The InnoDB plugin 1.0.5 is included in the distribution, and, unlike MySQL 5.1, it's built-in. There is no need to load and register the plugin. The performance enhancements developed for MySQL 5.4 are now available together with the other enhancements available with the InnoDB plugin. This was already available in the previous milestone, but it's worth mentioning it now, because not many people are aware of that.

Semi-synchronous replication

Of all the new features, this one is probably the most relevant. It is based on a patch made by Google to the InnoDB engine, and adapted by MySQL developers to make it engine-independent.
In short, it's a safety device that establishes some internal communication between master and slaves, and makes sure that at least one slave has received the changes being committed. That is, before committing, the master waits until at least one slave has acknowledged that it has received the portion of binary log necessary to reproduce the transaction.
UPDATE As Harrison Fisk rightly notes, there is a mistake in my description. The acknowledgment of the transaction being sent to the slave only happens after the master's commit to the binary log.
Some caveats apply:

It's called semi-synchronous replication, because it doesn't necessarily apply to all the slaves. Although you can manually check if the replication has worked for all the slaves, it's enough for the master to make sure that at least one slave has got the goods.
Received by a slave doesn't mean "executed". The slave acknowledges when it has got the binary log, even if the SQL thread is busy or stopped.
If there is no slave that can acknowledge receipt (e.g. slaves are all down or stopped), then the master reverts to the normal asynchronous operation.

To use this feature, you need to install two plugins: one for the master and one for each slave. No need to compile anything, though. They are provided with the binaries. All you need to do is load the appropriate plugin for each server.

master >  INSTALL PLUGIN rpl_semi_sync_master SONAME 'libsemisync_master.so';

slave1 > INSTALL PLUGIN rpl_semi_sync_slave SONAME 'libsemisync_slave.so';

Additionally, there are a few variables that you must set, either in the options file or online.

master > SET GLOBAL rpl_semi_sync_master_enabled=1;
slave1 > SET GLOBAL rpl_semi_sync_slave_enabled=1;

Now that the system is ready, let's see how to use it.
Before doing anything, we ask for the value of two important status variables:

SHOW STATUS LIKE 'Rpl_semi_sync%tx';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Rpl_semi_sync_master_no_tx  | 0     |
| Rpl_semi_sync_master_yes_tx | 0     |
+-----------------------------+-------+

The first one is the number of failed synchronized transactions, the second one is the number of successful ones. Since nothing has happened so far, they are both zero.

create table t1 (i int not null primary key) engine=innodb;
Query OK, 0 rows affected (0.13 sec)

 SHOW STATUS LIKE 'Rpl_semi_sync%tx';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Rpl_semi_sync_master_no_tx  | 0     |
| Rpl_semi_sync_master_yes_tx | 1     |
+-----------------------------+-------+

The first operation (a table creation) was successfully transferred to a slave. Let's do one more.

set autocommit=0;
Query OK, 0 rows affected (0.00 sec)

insert into t1 values (1);
Query OK, 1 row affected (0.00 sec)

COMMIT;
Query OK, 0 rows affected (0.00 sec)

SHOW STATUS LIKE 'Rpl_semi_sync%tx';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Rpl_semi_sync_master_no_tx  | 0     |
| Rpl_semi_sync_master_yes_tx | 2     |
+-----------------------------+-------+

Also this one was successful.
Now, let's try something sneaky. On each slave, we execute "STOP SLAVE SQL_THREAD". Normal replication would not work, but semi-synchronous replication will go on.


insert into t1 values (2);
Query OK, 1 row affected (0.01 sec)

SHOW STATUS LIKE 'Rpl_semi_sync%tx';
+-----------------------------+-------+
| Variable_name               | Value |
+-----------------------------+-------+
| Rpl_semi_sync_master_no_tx  | 0     |
| Rpl_semi_sync_master_yes_tx | 3     |
+-----------------------------+-------+
2 rows in set (0.00 sec)

The semi-synch replication has worked. However, if we query both master and slaves, only the master has the new record. The slaves have it only in their relay logs, which you can easily ascertain with mysqlbinlog.

Enhanced partitioning syntax

About one year ago, I briefly announced that this feature was in the making. With some interface improvement, it is now part of the regular partitioning. It's an extension of partitioning BY RANGE. As you know, you can only partition on one column value, and you can only partition on INTEGER columns. Both these restrictions were lifted in 5.5, with a syntax change that makes the code more readable and the overall feature more usable.
You can now partition by date, datetime, varchar, and char columns, not just integers, and you can use more than one column in your list. The most immediate usage of this extension is the ability of using dates without resorting to functions that convert the dates into integers. For example:

CREATE TABLE t2 
(dt date,a int, b int, c int)  
PARTITION BY RANGE COLUMNS (dt)  
(
PARTITION p0 VALUES LESS THAN ('2007-01-01'),
PARTITION p1 VALUES LESS THAN ('2008-01-01'),
PARTITION p2 VALUES LESS THAN ('2009-01-01'),
PARTITION p3 VALUES LESS THAN (MAXVALUE)
);

The COLUMNS keyword does the trick. The manual has more examples.

The partition helper has been updated to handle this new feature and generate partitions accordingly.

SIGNAL and RESIGNAL

If you have used stored routines extensively, you will certainly have asked yourself "why isn't there any way of raising an exception?" In the SQL standard, exception handling is implemented using the SIGNAL and RESIGNAL keywords, which were notably missing in MySQL 5.0 and 5.1 stored routines.
There have been many clever hacks by several community members to emulate the missing SIGNAL, but none were quite satisfactory. After long waiting here we have SIGNAl and RESIGNAL, which make stored routines programming much more robust and easier to debug. An authoritative example on how to use the new syntax is available in Roland Bouman's blog.

There is more. For the complete list of features, have a look at the official manual.
Happy hacking!

UPDATE Added more partitions to the example, as suggested by Jon.

⁽¹⁾ For the more technologically savvy, here's how Philip Stoev, one of my distinguished QA colleagues, describes the enhancements:

Historically, most of the MySQL tests have been manually created, however a modern database is so complex that it is impossible to test manually even a tiny percentage of the available functionality. Therefore for Betony [codename for MySQL 5.5], and the upcoming Celosia [5.6], the majority of our testing effort was concentrated around stochastic testing, using random data and millions of random queries to validate the behavior of the server across a wide range of scenarios and workloads.
For each new feature, and some existing ones, we automatically generated tests that attempt to cover all relevant SQL constructs, including the interaction between the feature being tested and existing code within the server. For features that have concurrency implications, we ran the random queries as a stress test or along with concurrent DDL statements. For areas such as the partitioning, we used the random queries to functionally validate the new code, by comparing the result from each query to a reference source, such as a previous version of the server.

Monday, December 07, 2009

MySQL user groups in Dubai and Sydney, on my way to NZ

In January 2010 I will attend Linux.Conf.Au, which this year is held in Wellington, New Zealand.
It's a long way from Europe to New Zealand, and so I will take a few stops.
On January 13 I will be in Dubai, UAE. If you are around, I would love to organize a MySQL meeting. I haven heard back from the local user group and it seems that a meeting will take place. Stay tuned for more.
On January 15th I will be in Sydney. The organizers are already at work. We will definitely have an user group meeting. I am open to suggestions about the topics.

From Sydney, I will continue to Wellington, where I will attend LCA2009 and then DrupalSouth before coming back to my usual time zone.

Friday, December 04, 2009

Gearman: distributed computing and Codebits pictures

The first Codebits> day lasted until long past midnight. So the attendees were a bit sleepy today, but they were brave and got up early enough for my session.
The presentation covered the basics of Gearman, some advanced magic to install remote MySQL servers, and more magic to enable MySQL users to shoot themselves in the foot repeatedly by combining a gearman/MySQL UDF and some clever scripts.
As usual, the slides are available on slideshare.
Some pictures from codebits 2009 are on Flickr.

Worshiping Technology.

Cool technology to worship.

José, the mastermind	Josette, the culture provider
Gonçalo, the friendly face who met me at the airport	&nbs; Pedro, magnificent guide of town and technology.

MySQL schema maintenance

At CodeBits I had my first session about MySQL schema maintenance. I covered the basic command line possibilities before coming to the recommended tool, MySQL Workbench.
The slides are available at slideshare.

Interesting questions: ([updated] with answers from the development team

[Q] Are there plans to administer MySQL Cluster with Workbench?
[A] Not that we know of.

[Q] Can Workbench deal with user permission maintenance across servers? (especially in cases where development and production users can't have the same privileges)
[A] YES. It's in the roadmap

[Q] Can MySQL Workbench help editing stored routines? Apparently, you can't change the routine code with ALTER PROCEDURE/FUNCTION, but you need to drop it, recreate it, and eventually change privileges.
[A] Not that we know of, although something can be done with Proxy to ease this problem.

[Q] Is the code for the plugin documented for contributors?
[A] Not yet, but it's going to be soon.

Sunday, November 29, 2009

Poor man's schema comparison

Comparing database schemas is one of those DBA tasks that occur all the time.
If you can afford the luxury of having a GUI, then MySQL Workbench is the tool for you. It will help you compare structures and eventually synchronizing the differences.
If the only thing at your disposal is a command line interface, without the possibility of installing anything (a common occurrence when you are consulting), then this Poor man's schema comparison tool may come handy.

It's a Perl script that doesn't require module installations beyond the most common ones, and produces a simple output, fit for further examination with the tools that you have at your disposal when working at the command line.
By default, it invokes vimdiff, to let you know visually what the differences are. If you change the command to diff, you get a simple output that can tell you at a glance what differences you are dealing with.
For example, after executing

schema_diff mysql://root:mysql@127.0.0.1:3306/crosstab \
mysql://msandbox:msandbox@127.0.0.1:5139/crosstab

I got the difference (a column that I introduced in the first database but not in the second one) clearly visible on screen.
vimdiff

I will be talking about this and other resources for the DBA during my session on schema maintenance at CodeBits in Lisbon (December 3-5, 2009).

Codebits 2009, coders conference and competition in Lisbon

Codebits is approaching. Form December 3rd to 5th, this gathering of 600 developers for a conference, which is also and foremost a competition, will occupy the mind of the best coders in Europe.
I will be a speaker, with two sessions:

Schema maintenance in MySQL, focusing mainly on MySQL Workbench, but with an introduction to the other choices.
Introduction to Gearman - a distributed, language and OS independent system for dynamic programming.

Also Lenz will be there, and quite busy. He will also have two sessions:

Why you should be using a distributed version control system (DVCS) for your project, with advice that will benefit every developer.
MySQL High Availability Solutions - Best practices and commonly used HA setups for MySQL

The event is hardly like any other conference. It will be a momentous show, with a part that start like a conference but goes on as a competition.
If you like coding, you must show up!

Thursday, November 12, 2009

Gearman for MySQL

If you haven't yet heard about Gearman, it's time to have a look at it. Its distributed client server architecture are a perfect match for today's cloud oriented applications.
When talking about Gearman, much stress is often given on its scalability features, such as map/reduce and distributed loads. But Gearman has also a distinctive advantage in the feature department, because of its design that I like to describe as cooperation of the fittest.

While a traditional application is written entirely in a given language, or it must find ways of integrating multiple languages parts, Gearman encourages easy cooperation between parts written in different languages, eliminating the need for library translation, and allowing the best implementation of a feature to be used by another part of the application.
Today, November 12, at 14:00 UTC, there will be a MySQL University presentation on Gearman for MySQL.
One of the components created by Gearman enthusiasts is a MySQL Gearman UDF. Using this extension, you can easily define functions in any language you are familiar with, and then call that function from MySQL. Some examples shown in the presentation include shell access, regular expressions enhancements, storing results incrementally in a file, using language-specific library extensions, and your imagination can do the rest.
Using this sample worker code, you can then call the UDF function from MySQL with something like thge following:


select gman_do('reverse','abcd') as test;
+------+
| test |
+------+
| dcba |
+------+

SELECT gman_do('shell',
concat(' ls -lh ', 
  (select variable_value from information_schema.global_variables 
  where variable_name = "datadir" )))\G 

total 40976
-rw-rw----  1 gmax  staff   5.0M Nov 11 13:34 ib_logfile0
-rw-rw----  1 gmax  staff   5.0M Nov 11 13:34 ib_logfile1
-rw-rw----  1 gmax  staff    10M Nov 11 13:34 ibdata1
-rw-rw----  1 gmax  staff   1.2K Nov 11 13:34 msandbox.err
drwx------  2 gmax  staff   2.4K Nov 11 13:34 mysql
-rw-rw----  1 gmax  staff     6B Nov 11 13:34 mysql_sandbox5140.pid
drwx------  2 gmax  staff    68B Nov 11 13:34 test


select gman_do('eval','2 * 3') ;
+-------------------------+
| gman_do('eval','2 * 3') |
+-------------------------+
| 6                       |
+-------------------------+


select gman_do('eval',
  concat('$_="',host,'";tr/a-z/b-za/; $_')) 
as test from mysql.user;
+-------------+
| test        |
+-------------+
| %           |
| mpdbmiptu   |
+-------------+

Thursday, October 15, 2009

Spider and vertical partition engines with new goodies

The Spider storage engine should be already known to the community. Its version 2.5 has recently been released, with new features, the most important of which is that you can execute remote SQL statements in the backend servers. The method is quite simple. Together with Spider, you also get an UDF that executes SQL code in a remote server. You send a query with parameters saying how to connect to the server, and check the result (1 for success, 0 for failure). If the SQL involves a SELECT, the result can be sent to a temporary table. Simple and effective.

In addition to the Spider engine, Kentoku SHIBA has also created the vertical partitioning engine. Instead of splitting tables by record, you split them by columns. You can define a table with column A and column B, with primary key K, and another table with column C and column D, with primary key K. The vertical partition engine allows you to define a table with columns K, A, B, C, D, which looks to the user like a regular column. The backend tables can be of any engine.
There is a MySQL University session about the Spider and VP engines on November 26th at 15:00 CEST. Free attendance!
The slides are online: Sharding for the masses

Wednesday, October 07, 2009

Introduction to Gearman at the Italian Research Council

I was invited to contribute some technological views at the Italian National Research Center, during the Internet Governance Forum.
My contribution was ahigh level introduction to Gearman, which sparked a debate about the impact of the cloud on the future of open source. Indeed, cloud computing technologies have the potential of harming open source adoption. If this is a threat and how much it can affect the future of open source depends on the business model behind the cloud.

More interesting topics were discussed both during the scheduled sessions and in open gathering. During dinner, for example, I got some disturbing statistics on database teaching in Italian universities. It's very common to require MS Access and .NET as supporting technologies for most of the IT related exams, with little or no room for open alternatives like MySQL and PostgreSQL.
Seeing a newsstand inside the compound of the NRC makes me think that progress slow in the institution that is supposed to be the guide of the Italian technology. The same thought crossed my mind seeing that the conference hall, a beautiful 200 seat hall well equipped for multimedia has only one power socket, but it was off limits for users.

Thursday, September 24, 2009

Jeremy's article on MySQL Sandbox in Linux Magazine

Jeremy Zawodny of Craiglist has written a kind article about MySQL Sandbox.
The article, MySQL Sandbox: Treat MySQL Instances like Virtual Machines, is a practical test of MySQL Sandbox with usage examples and warm appreciation.
Thanks, Jeremy!

The article was published in July but I noticed it only today. I guess I should pay more attention to my favorite topics when I travel.

Tuesday, September 15, 2009

Partitioning with non integer values using triggers

Looking at Bug#47310, which is a feature request that I hear frequently when I talk about partitions, I wrote a comment, suggesting triggers to work around the limitation.
The reason for the limitation is that allowing arbitrary functions for partitioning was too complex and it was provoking crashes and other unpleasant side effects (see the discussion under bug#18198).
But if you use a trigger, the resulting column is a plain integer, and many of the side effects disappear. The drawback is that you need to add a column to your table, and you need to use that column when searching for data. With that in mind, you can implement the workaround quite easily.


USE test;
DROP TABLE IF EXISTS users;

CREATE TABLE users (
        user_id int(10) NOT NULL,
        username varchar(25) DEFAULT NULL,
        dummy INT not null,
        PRIMARY KEY (user_id, dummy),
        UNIQUE KEY username(username,dummy)
) ;

CREATE TRIGGER users_bi
BEFORE INSERT ON users
FOR EACH ROW
SET NEW.dummy = ASCII(LOWER(LEFT(NEW.username,1))); 

ALTER TABLE users PARTITION BY RANGE (dummy) (
        PARTITION p0 VALUES LESS THAN  (96),  #being f
        PARTITION p1 VALUES LESS THAN (109),  #being m
        PARTITION p2 VALUES LESS THAN (115),  #being s
        PARTITION p3 VALUES LESS THAN (122)   #being z
); 

INSERT INTO users (user_id, username)
VALUES (1,'Joe'), (2,'Sam'),(3,'Abe'),(4,'Rich');

EXPLAIN PARTITIONS SELECT * FROM users 
where username = 'Abe'; 
# This simple query doesn't use partition pruning. 
# This is to be expected.

EXPLAIN PARTITIONS SELECT * FROM users 
where dummy = ASCII('a') and username = 'Abe'; 
# Here, the partition pruning kicks in, at the price of an extra
# condition in the query.