Monday, December 20, 2010

Looking for a hack - Passing comment-like info through the binary log

hacker I am facing an interesting problem. I need to mark somehow a statement in such a way that the comment is preserved through the binary log.
I don't have control on how the statement is generated or using which client software. For the sake of example, let's say that I need to mark a CREATE PROCEDURE statement in such a way that, if I extract the query from the binary log and apply it to another server, the information is still available.

Background

Normally, I would use a comment. The first thing I would think is
CREATE PROCEDURE p1(i int) select "hello" /* This is my text */
But most client libraries will strip it.
There was a clever trick by Roland Bouman that allowed users to bypass this limitation. You could use a qualified comment such as /*!999999 This is my test */, but unfortunately it only works in MySQL 5.0.x, while MySQL 5.1 strips everything down, even if the comment is a legitimate keyword.

create procedure p9 (j int) insert /*!50000 INTO */ t1 values (j) ;
Query OK, 0 rows affected (0.00 sec)

 show create procedure p9\G
*************************** 1. row ***************************
           Procedure: p9
            sql_mode: 
    Create Procedure: CREATE DEFINER=`msandbox`@`%` PROCEDURE `p9`(j int)
insert  INTO  t1 values (j)
character_set_client: latin1
collation_connection: latin1_swedish_ci
  Database Collation: latin1_swedish_ci
1 row in set (0.00 sec)

Other tricks

Something else that I have tried: I can inject a query before or after the one that I need to monitor.
create table if not exists comments (t varchar(100)) engine=blackhole;
 
update comments set t='the next statement is what I need';
create procedure p1(i int) insert into t1 values (j);
update comments set t='the previous statement is what I need';
This approach does possibly introduce some overhead.

Or I can add a comment clause in the statement.
create procedure p1(i int) comment="this is what I need" insert into t1 values (j);
This approach requires parsing the SQL, and dealing with artistic indentation and usage of other options in the query. And if I need to deal with commands that don't support the "comment" option, I am back to square one.

Advice wanted

So far, the only method that works almost always is the blackhole trick (1)
I would like to know if there is any method of introducing a piece of information related to a given statement, in such a way that the comment survives after one of the following:
  • The binary log is converted to queries and passed to a MySQL client that applies the stream of queries to another server.
  • The binary log is associated with another master, and then passed to a slave through regular replication.
In both cases, I need to find the original information in the last server's binary log.

As a related matter, I know that MySQL, in regular replication, passes some information across binary logs, and that information is the server-id. If I set an intermediate server as relay slave, the server-id of the original master is associated with the query recorder in the binary log of every slave. I don't know if I can use this information for my purposes, but I would like to know how does the replication process maintain the server ID across servers.

Maybe it's too late for me and I can't see an obvious solution. I will appreciate any suggestion. Thanks in advance

(1) If the blackhole is disabled, the method fails, or introduce unacceptable overhead.

Thursday, December 16, 2010

Some hidden goods in MySQL 5.5

5.5 GA The announcement of MySQL 5.5 released as GA has outlined the improvements in this version, which indeed has enough good new features to excite most any user.
There are two additions, though, that were lost in the noise of the bigger features, and I would like to spend a few words for each of them.
The first addition is something that users of stored routines have been waiting for since MySQL 5.0. No, it is not SIGNAL and its close associate RESIGNAL, which have been publicized enough. I am talking about the stored routine parameters, for which now there is a dedicated table in the information_schema.
Let's see an example, with a simple procedure that uses three parameters.

drop procedure if exists add_to_date ;
create procedure add_to_date(in d date, in i int, out nd date)
deterministic
    set nd = d + interval i day;
This works as expected in both 5.1 and 5.5. (Never mind that it's redundant. I know it. It's only for the sake of keeping the example short).

 call add_to_date('2010-12-15',10,@new_date);
Query OK, 0 rows affected (0.00 sec)

 select @new_date;
+------------+
| @new_date  |
+------------+
| 2010-12-25 |
+------------+
1 row in set (0.00 sec)
The difference starts to show when you want to deal with this procedure programmatically. If you need to find out which parameters are expected by this procedure, your only option in MySQL 5.1 is parsing the result of SHOW CREATE PROCEDURE add_to_date. Not terribly difficult in any scripting language, but a hassle in SQL.
In MySQL 5.5, instead, you can easily get the routine parameters with a simple query:

 select parameter_name, parameter_mode,data_type from information_schema. parameters where specific_schema='test' and specific_name= 'add_to_date' order by ordinal_position;
+----------------+----------------+-----------+
| parameter_name | parameter_mode | data_type |
+----------------+----------------+-----------+
| d              | IN             | date      |
| i              | IN             | int       |
| nd             | OUT            | date      |
+----------------+----------------+-----------+
3 rows in set (0.00 sec)

Speaking of the information_Schema, there are more goodies that were not emphasized enough. The Innodb engine that you find in the server is the evolution of the InnoDB plugin that ships with MySQL 5.1. Only that it is now built-in. What many people forget to mention is that the plugin (and thus the current InnoDB engine in 5.5) comes provided with its own InnoDB-specific instrumentation tables in the information_schema.

show tables like 'innodb%';
+----------------------------------------+
| Tables_in_information_schema (innodb%) |
+----------------------------------------+
| INNODB_CMP_RESET                       |
| INNODB_TRX                             |
| INNODB_CMPMEM_RESET                    |
| INNODB_LOCK_WAITS                      |
| INNODB_CMPMEM                          |
| INNODB_CMP                             |
| INNODB_LOCKS                           |
+----------------------------------------+
7 rows in set (0.00 sec)
This is the same set of tables that you may have seen if you have worked with the InnoDB plugin in 5.1. In short, you can get a lot of the info that you used to look at in the output of SHOW ENGINE INNODB STATUS. For more information, you should look at what the InnoDB plugin manual says on this topic.
I don't know if the tables can replace the SHOW ENGINE INNODB STATUS. Perhaps someone can comment on this issue and provide more information?

Thursday, December 09, 2010

Speaking at the O'Reilly MySQL Conference - April 2011

I will present two talks at the MySQL Conference next April.
One is a three hours tutorial on Advanced MySQL Replication Techniques, and the other is a normal session on The art of sandboxing. Reducing Complex Systems to Manageable Boxes.
The first topic is not a first to me. But the contents are going to be fresh and new. There has been so much going on in the replication field, that the talk on this topic that I presented in 2007 looks like ancient history.
The second topic is completely new. I have often presented the result of my sandboxing efforts, but I have never thought of explaining the techniques themselves. Now that I have got some experience at reducing differently complex systems to sandboxes, I want to share the knowledge, to promote more work in this field.

Monday, December 06, 2010

Excluding databases from mysqldump

A question that came up during the MySQL track at the UKOUG conference in Birmingham was "Can I exclude only a few databases from mysqldump? Let's say that I have 50 databases, and I want to dump all of them, except a few."
As many know, mysqldump has an option to ignore specific tables. SO if you have 1,000 tables in a databases, you can tell mysqldump to dump all the tables except a few ones.
There is no corresponding option to exclude one or more databases.
However, if you know your command line tools, the solution is easy:
First, we get the list of all databases:
mysql -B -N -e 'show databases'
information_schema
employees
five
four
mysql
one
performance_schema
six
test
three
two
-B forces batch mode (no dashes box around the data), while -N gets the result without the headers.
Now, let's say that we want to exclude databases four, five, and six. And since we want to avoid unpleasant side effects, also information_schema and performance_schema.
Thus, we pipe the previous data through a filter. I use Perl, but sed or grep could get the job done.
mysql -B -N -e 'show databases' | \
  perl -ne 'print unless /\b(?:four|five|six|_schema)\b/' 
employees
mysql
one
test
three
two
Now that we have the list of databases that we need, we can tell mysqldump to backup the databases from such list. All we need is converting the vertical list into a horizontal one using xargs

mysql -B -N -e 'show databases' | \
  perl -ne 'print unless /\b(?:four|five|six|_schema)\b/' \
  xargs echo mysqldump -B 
mysqldump -B employees mysql one test three two
That's it. The last line is the resulting command. Once you are sure that it is what you want, remove the "echo" after xargs, and the command will be executed.

Update: Thanks to Shantanu, who pointed that the regexp does not filter properly. So I added the boundary checks (\b) to make my words match the result.

Friday, December 03, 2010

My picks for PGDay-EU 2010

PGDayEU2010
On Sunday I will be in Stuttgart with the double purpose of attending the annual European PostrgreSQL conference and the technical meeting of my company that will be held after the normal proceedings of PGDay-EU.
For the first time in several years I am attending a conference where I am not a speaker. In my previous job I did not have much opportunity to attend PostgreSQL meetings, and I welcome this opportunity. The schedule is quite interesting, and I have made my personal picks:

Who's afraid of MySQL forks?

mysql forks? There is much talk about MySQL forks and how they are going to replace MySQL, or take over MySQL user base, or become more powerful/profitable/popular/you-name-it than MySQL itself.
Let's clear some air on this topic. There is more about forks than meets the eye, especially if you think about a few obvious facts.
What's a fork? According to Wikipedia
a project fork happens when developers take a legal copy of source code from one software package and start independent development on it, creating a distinct piece of software.
By this definition, when someone who doesn't work at the MySQL project distributes a package that is based on MySQL code but differs from the original, it's a fork.
Why am I approaching the issue from this angle? Because, apart from Windows users, who mostly download MySQL from the official site, the majority of users get MySQL through a Linux distribution or some other project. And most of the time such packages are different from the ones built by the MySQL team. There is nothing wrong with that. The differences are sometimes minimal packaging changes done to adapt MySQL to the specific distribution, and sometimes they are a cherry-picking application of patches to an old version that needs to be maintained so that the package is unlike any other MySQL version that you may find in the wild. Even if the version is the same, depending on the distribution and the age of the server, the code beneath could be wildly different from the official versions.
Thus, it turns out that many users, possibly the majority, are using a MySQL fork, albeit a very minor one.
But when people talk about forks, they often refer to three main projects:
  • The Percona distribution. This is a collection of a few distinct patches in the server, coupled with a fork of the InnoDB plugin, named XtraDB, and an independent tool for backup (XtraBackup). This fork has a solid business background. Every patch has been developed to meet user requests, and the engineers at Percona maintain them appropriately.
  • Then we have the MariaDB fork, which is a series of changes to the MySQL core, motivated by the desire of the developers to build a rich set of feature enhancements while being backward compatible to the main distribution. The business model is thus a fast track of new features and bug fixes to customers.
  • And then there is Drizzle, which has even less business traction than MariaDB, but a very well defined goal of creating a lightweight database by re-engineering a bare bones stripped down version of MySQL that is now very distant from its origins.
What I said in the above descriptions is just the synopsis of what these three forks are. In recent mythology, it is fabled that, if MySQL ceases to exist (because it goes bankrupt, or Oracle kills it, or a major accident happens to the project, whatever) users can replace MySQL with one fork, and live happily ever after.
Not so fast. There is something that few people take into account when listening to this too often repeated tale.
What most observers miss is that the forks' original code (with the exception of Drizzle) is very marginal. The bulk of the distribution is still the code produced by the MySQL team, which is merged at every minor release, and integrated with the patches produced by Percona and MariaDB. So, while technically they are forks of MySQL, they can't live independently from the official MySQL distribution. Both Percona and MariaDB don't have the manpower to maintain the server by handling the huge amount of bugs that the MySQL team is fixing every month.
There is also a matter of skill set. Percona has talented InnoDB experts, while MariaDB has mostly core server experts (and some are among the top ones, I may add). They could complement each other, although it seems that cooperation between the two projects is not as good as it used to be. (Could be my personal impression.)
The bottom line, though, is if both projects are able to survive should the main project become unavailable. I am not suggesting that Oracle wants to make MySQL scarce. On the contrary, all the information at my disposal suggest that Oracle will keep MySQL publicly available for long time.
This state of affair seems to indicate that Drizzle is, instead, a true fork that does not depend on MySQL health. To some extent, this is true. However, the main storage engine in Drizzle is InnoDB. Therefore, at least today, Drizzle is as dependent on Oracle as Percona and MariaDB.
What would happen tomorrow, if the disaster depicted by doomsday advocates comes true and MySQL actually disappears? I don't honestly know, but I would love to have a public commitment from the major players, about what they are prepared to do in terms of maintaining that huge chunk of code that today they take from Oracle releases on a monthly basis.
This is all matter of thought for MySQL users.

About adoption of the forks today, I have seen five types of arguments in favor of a MySQL fork:
  1. I need the feature provided by Percona or MariaDB, or I need a quick bug fix that I can't get from the slow roadmap at Oracle. I trust that this handful of people are able to maintain that little code that differs from MySQL and matters to me. So I don't care if they don't have 100 developers on the task.
  2. Given Oracle's track record in other Open Source projects, I don't trust them to deliver MySQL according to FOSS principles, so let's go for true Open Source alternatives.
  3. Most MySQL developers have now left Oracle, and so the forks have more chances of being higher quality.
  4. Cool! MariaDB/Percona has a bunch of features more than MySQL. It must be better. Let's use it.
  5. I like new technology. Let's plunge into them!
Argument #1 is a solid business backed reason for adopting some software. The risk is often well calculated, especially if the evaluation can be backed by performance and functional tests.
Argument #2 is frivolous, as it mixes subjective feelings into business matters. And so is argument #4. Yet, these two types of advocacy are quite popular and spread much faster than the more reasonable approach seen at #1.
Argument #3 is debatable. MySQL developers at Oracle outnumber all forks easily. The idea that the departure of a few core developers can alter the system in such a way that the whole project crumble has been already negated by facts: MySQL 5.5 is an excellent release, with enthusiastic appreciation from power users. While I agree that top MySQL talents work at the forks, I consider the MySQL team to be still in excellent shape.
Argument #5 is reasonable, if it is followed by cool judgment and backed by facts. I am one who is always ready to try new solutions, and love experimenting with cool technology. But adoption is different from proof of concept. I am happy to see that Drizzle can replace MySQL in some applications, but would I trust it in its present beta stage? Certainly not. So, I am happy to test, but I trust my valuable data to more stable solutions.

What's for you, the final user? My personal advice is: don't adopt blindly because of some enthusiastic advertising. But test the product thoroughly, and if it fits your needs, by all means, go for it. But if you don't have a specific reason, I recommend staying with the official branch, because, despite the change in affiliation, there is still a well experienced team behind it.

Sunday, November 28, 2010

Dispelling some unintentional MySQL FUD

Sakila dispelling FUD There are three types of FUD: the first and more genuine is (#1) the intentional spreading of falsehood, mostly to gain some marketing advantage over a competing product. While I despise this practice, I understand it.
Then there is (#2) FUD spread by ignorance, when the originators are so blindly enraged by their hatred for a product that they don't care about getting the facts straight.
And finally, there is a third kind, not less dangerous, which is (#3) the spreading of FUD with good intentions, when the authors believe that they have the facts straight and they want to help.
I have recently come across two examples of unintentional FUD. For different reasons, my comments to these public cases did not get through, and then I have to say something about that here in my blog.

MySQL is not ACID complaint

This surprising piece of news came in the blog of a company that calls itself the remote DBA experts.
The claim is this: if I insert a record in a table and then issue a ROLLBACK command, the record is not rolled back.
Anyone who has a minimal knowledge of MySQL knows about InnoDB tables (luckily for the poster, InnoDB is default in MySQL 5.5.6, which he was testing) and autocommit.
Reading through the example, one sees that the poster did not know about this piece of information. In MySQL, autocommit is ON by default. So if you want to rollback a record, you need to deactivate it. This is not optimal, and it can be debated, but if you read the docs, you don't claim something that is simply the result of your lack of knowledge. MySQL has shortcomings, but being unable to rollback a record is not one of them. Hence, this is FUD type #2.
Why I am writing all this here and not as a comment in that blog? Because I did post a comment, on November 23rd, but as of today, it has not been approved yet. The same is true for comments posted by other more knowledgeable people.

MySQL licenses. When it's free and when you need to pay for one.

This article is well intentioned. MySQL Licenses: The Do's and Don'ts of Open Source, or What's All the Fuss About? is a well thought piece, with practical examples, to help users decide what to do with MySQL licensing, i.e. when they need to pay and when they don't. Unfortunately, the article contains some unintentional confusion, and therefore leaves the readers with more wrong ideas than they had before.
I left a long comment on that blog, but for some unfathomable reason it was reduced to a tiny piece, and thus the need for explaining the matter here again.
The poster says this:
I make commercial software, which needs to have MySQL installed. My customers can use my commercial software, for which they do need to buy a license, in combination with the MySQL database engine, for which they don't need to pay. Because the MySQL engine is not embedded in my commercial software and I don't redistribute MySQL together with my software, I don't need a commercial license for MySQL and neither do my customers.
I am afraid that this wishful information is not correct. The GPL FAQ states it clearly:
If a library is released under the GPL (not the LGPL), does that mean that any program which uses it has to be under the GPL or a GPL-compatible license?
Yes, because the program as it is actually run includes the library.

Another quote:
However... as long as I have no desire to sell the embedded MySQL source code commercially, I can let the GPL license apply.
Also this is not true. The GPL does not regulate commercial transactions. It only deals with distribution of software. If I want to distribute a public domain but GPL-incompatible software linked to a GPL application or library, I am violating the GPL, even if I don't charge anything.

Another source of disinformation is "If you decide to pay for a MySQL license, you don't actually pay for the software."
This is also incorrect. Oracle sells two kind of things with MySQL. One thing is a subscription to services (MySQL Enterprise). If you buy this, you are not getting a license (unless you ask for it explicitly) but an agreement about services for a given periods.
The other thing that Oracle sells is licenses. They can do it because they own the source code, and they can decide to release it either as GPL (which is what you download from the MySQL site) or with a commercial license. If you ask for a license, you will most definitely get one. You can also get a license together with a subscription, if you are so inclined, but that doesn't mean that you aren't buying a license.
The important thing to understand to put the matter in perspective, is that the above information about licensing was still true before 2008, when MySQL was owned by MySQL AB, and it is still true today. Oracle, despite all the preemptive accusations of being ill intentioned, has not changed the rules of the game.

A world of ebooks

ebooks I am a bibliophile, or, to say it in plain English, a book lover. I have been collecting books since I was in first grade. I read books at high speed, which is both a blessing and a curse. A blessing, because I can squeeze useful information out of a book very quickly, and that's useful for my job, and for some of my hobbies. A curse, because when I travel one book is usually not enough to keep me busy for the whole travel, and I need to carry or buy more, with negative effects on the weight of my luggage and my on my back. Ten years ago I had a brief but intense experience with electronic books in a Palm hand held device. It didn't last long, though. The quality of ebooks and readers in that period was less than optimal, and I have left the matter rest for a while.
In the meantime, I kept collecting electronic books, mostly PDF editions of technical books that I keep in my laptop for quick reference. Reading them from cover to cover, though, is not a pleasant experience in a laptop. Ditto for reading fiction or essays. The laptop screen is not comfortable for such exercise.
Then, last year, I bough an ebook reader.
That changed the whole business. Reading ebooks became very similar to reading paper books. The size of the screen and the ability of increasing font size makes your reading a pleasure. As for my back and luggage problems, that's solved hands down. The weight of the device is the same, no matter if I carry one ebook or one hundred.
Suddenly, the dozen of ebooks that I had kept idle in my laptop sprang to life, and I was able to read them like a paper book, easily, comfortably, and with pleasure.
I started buying more ebooks, both of fiction and of technical matters. The latter are especially welcome. Whenever I travel to conferences, I am tempted to buy some useful book, and then I regret when it burdens my backpack during the trip home, and fights for room on my overcrowded book shelves. No more of this. Now, when I visit a book booth at a conference, I simply take note of the interesting titles, and then I buy the ebooks at the publisher's site directly. If there is no ebook, I can easily convince me that the book is not really needed.
A few months ago there was some new development. My ebook reader's screen was faulty. It was showing a few unwanted lines at the bottom and the top of the screen, making it difficult to read menus. No big deal. I sent it to the manufacturer, which replaced the screen for free. The only trouble was that the replacement took three months! During that period, I experienced reading ebooks (to which I was by then addicted) with my Android phone, using a wonderful application named Aldiko. The user friendliness of this app more than compensated for the smaller screen size, and I was able to read technical and fiction books with little problem. But I was missing the big screen. So the delay of the back shipment was partially responsible for the lowering of my defenses, when I entered an Apple store and I couldn't leave without a new iPad.
I felt guilty for a while, but the guilt disappeared in a matter of hours, when I loaded all my ebooks in the iPad, and saw what a difference a bigger and colorful screen does. Compared to the six inches of my ebook reader, the iPad is huge, and the reading is even easier and more pleasurable. I was hooked.
Since then, my personal library of ebooks has grown rapidly. I have bought 90 (yes, ninety) books from O'Reilly, including many that I had already bought on paper, and now I am giving away to friends and libraries.
I need to spend a few words of praise for O'Reilly. In the jungle of book publishing, O'Reilly is the best and more user friendly publisher available. The quality of its books is excellent, the choice of catalog vast and modern, the service impeccable. There are other publishers that offer comparable quality (e.g. the Pragmatic bookshelf or Manning) but not the same rich catalog, or a similarly vast catalog (e.g. Packt Publishing) but not the same quality.
If I have to note any negative points about O'Reilly, is that there is no wish list in their shop. So, for now, I am restricting my wishes to my list on Amazon.

Sunday, November 21, 2010

How MySQL Workbench breaks itself

Once upon a time, there was a policy in MySQL not to add new features after the beta stage.
To my surprise, MySQL Workbench 5.2.30 introduces a new feature, the query formatter. I gave it a try. The results are not extremely encouraging. Granted, it's a plugin and not a feature in the core application, but nonetheless one would expect something more stable in a GA release, especially since the plugin features are displayed in the main menu, and unless you have read the announcement, you couldn't easily tell the core from the plugins.
This is what I have got in just a few minutes:

Bug #58356: beautify function fails on CREATE TABLE
Bug #58357: beutify function erases statement on CREATE INDEX
Bug #58358: query formatter fails on selected query
Bug #58359: query formatter indentation fails on partially selected query
Bug #58360 query formatter converts non-keywords to uppercase
Bug #58361 Query formatter mangles query when CASE operator is used

MySQL Workbench is a great product. I would like it to be more solid. New features, even as a plugin, should be more carefully released that this one.

Update. 4 days after my submission, the WB team has fixed all 6 of them. Kudos!

Friday, November 19, 2010

How to create a private cloud in your laptop

private cloud laptopEverybody is moving to cloud architectures, although not all agree on what cloud computing is. In my limited understanding, and for the purpose of my work, cloud computing is a quick and versatile availability of virtual machines.
Now, if my purpose was deploying these machines, a private cloud in one host (namely, my laptop) would not make sense. But to create a flexible testing environment, it works very well.
Users of virtual machines software such as VMWare or VirtualBox may ask what's new here. You can create many virtual machines and use them within the same host.
True, but creating a new virtual machine in one minute without duplication of resources is not so easy. This is what this article covers. More specifically, it covers how to clone virtual machines efficiently with VMWare Fusion on a Mac OSX laptop.
I have been a VMWare user for more than 10 years, mostly on Linux, where I used VMWare Workstation. One of the features that I liked on that application was the ability of cloning virtual machines, where the clones share the disk with the original VM. VMWare Fusion does not have a cloning feature, but there are ways of achieving the same result.
If you copy a virtual machine and then open it with Fusion, it will ask you if you have copied it, and after your answer, it will use the new virtual machine independently from the first one. The only problem is that you will have twice as much disk space occupied. So, in addition to the space that will eventually got eaten up in your limited laptop, the duplication process is slow (copying 10 GB does not happen instantly). The method shown here will instead allow you to clone a virtual machine in less than one minute.

Part I - How to clone a virtual machine

I have got this recipe from HOWTO: Manual Linked Cloning in Fusion, a year old tutorial that I have integrated with updated images.

Do it once - Create a virtual machine with everything you need.

This is a once only step. Install a virtual machine as usual. In my case, I used Ubuntu server 10.10. When I said with everything you need I mean making sure that everything you think it will be necessary should be installed in this base VM. If you don't, you may end up installing the same thing in every cloned virtual machine.
What I did was updating the software, then installing the build-essential package and the VMWare tools, and then enabling the "partner" repository to install the original Sun Java packages instead of the default OpenJDK. Next, I downloaded the source code for MySQL 5.5 and made sure that I had all the software necessary to compile and build the database.
Finally, I updated the CPAN with my favorite modules, stopped the virtual machine, and I was ready for the next step.
What I had in my hands was a working virtual machine. In order to make it clonable, I will go through some basic steps.

Step 1. Copy the VM disks

I create a new directory under the same path where the normal virtual machines are. I called it ub_srv_10_base, and I copied the .vmdk files into this new directory.

Step 2. Modify the base disk to refer to fixed paths

In each virtual machine, the disk that has the same name as the VM (without numerals) is a sort of index of the real files containing the data. This file needs to be edited for further usage. Originally, it looked like this:
# Disk DescriptorFile
version=1
encoding="UTF-8"
CID=f88ac433
parentCID=ffffffff
isNativeSnapshot="no"
createType="twoGbMaxExtentSparse"

# Extent description
RW 4192256 SPARSE "ubuntu_server_10-s001.vmdk"
RW 4192256 SPARSE "ubuntu_server_10-s002.vmdk"
RW 4192256 SPARSE "ubuntu_server_10-s003.vmdk"
RW 4192256 SPARSE "ubuntu_server_10-s004.vmdk"
RW 4192256 SPARSE "ubuntu_server_10-s005.vmdk"
RW 4192256 SPARSE "ubuntu_server_10-s006.vmdk"
RW 4192256 SPARSE "ubuntu_server_10-s007.vmdk"
RW 4192256 SPARSE "ubuntu_server_10-s008.vmdk"
RW 4192256 SPARSE "ubuntu_server_10-s009.vmdk"
RW 4192256 SPARSE "ubuntu_server_10-s010.vmdk"
RW 20480 SPARSE   "ubuntu_server_10-s011.vmdk"

# The Disk Data Base 
#DDB

ddb.toolsVersion = "8323"
ddb.virtualHWVersion = "7"
ddb.longContentID = "d14a2f23de35287969b8eaebf88ac433"
ddb.uuid = "60 00 C2 98 72 3b 6e 76-ff 05 b7 ff 8a 07 6e 92"
ddb.geometry.cylinders = "2610"
ddb.geometry.heads = "255"
ddb.geometry.sectors = "63"
ddb.adapterType = "lsilogic"
To make it usable, I added a relative path to the file names, and removed the UUID.
# Disk DescriptorFile
version=1
encoding="UTF-8"
CID=f88ac433
parentCID=ffffffff
isNativeSnapshot="no"
createType="twoGbMaxExtentSparse"

# Extent description
RW 4192256 SPARSE "../ub_srv_10_base/ubuntu_server_10-s001.vmdk"
RW 4192256 SPARSE "../ub_srv_10_base/ubuntu_server_10-s002.vmdk"
RW 4192256 SPARSE "../ub_srv_10_base/ubuntu_server_10-s003.vmdk"
RW 4192256 SPARSE "../ub_srv_10_base/ubuntu_server_10-s004.vmdk"
RW 4192256 SPARSE "../ub_srv_10_base/ubuntu_server_10-s005.vmdk"
RW 4192256 SPARSE "../ub_srv_10_base/ubuntu_server_10-s006.vmdk"
RW 4192256 SPARSE "../ub_srv_10_base/ubuntu_server_10-s007.vmdk"
RW 4192256 SPARSE "../ub_srv_10_base/ubuntu_server_10-s008.vmdk"
RW 4192256 SPARSE "../ub_srv_10_base/ubuntu_server_10-s009.vmdk"
RW 4192256 SPARSE "../ub_srv_10_base/ubuntu_server_10-s010.vmdk"
RW 20480 SPARSE   "../ub_srv_10_base/ubuntu_server_10-s011.vmdk"

# The Disk Data Base 
#DDB

ddb.toolsVersion = "8323"
ddb.virtualHWVersion = "7"
ddb.longContentID = "d14a2f23de35287969b8eaebf88ac433"
# ddb.uuid = "60 00 C2 98 72 3b 6e 76-ff 05 b7 ff 8a 07 6e 92"
ddb.geometry.cylinders = "2610"
ddb.geometry.heads = "255"
ddb.geometry.sectors = "63"
ddb.adapterType = "lsilogic"

Now the file is ready to be used by other virtual machines. Notice that the files in this directory do not make a virtual machine. Only the storage part is here.

Step 3. Make the disks read-only

We need to make sure that the base disks are not modified. So, we remove the "w" attribute from all the files in the base directory.

$ chmod a-w ub_srv_base/*s0*

Step 4. Remove the original virtual machine

Make a copy if you want. And actually having a backup of the base disk directory is a very good idea. Compress it and save it to a good location. But remove the virtual machine from Fusion. This will avoid confusion later on.

Do it many times - Create a clone

This process looks frightfully long, especially if you count the images below, but in reality it's very quick and painless. Once you get familiar with it, you will be cloning virtual machines in no time at all.
vm1
Back in VMWare Fusion, select "create a new Virtual Machine".
Make sure that you don't have the Ubuntu CD in your read DVD player. Click on "continue without disk".

vm2
Ignore the top options, and select "create a custom virtual machine".

vm3
Select the same operating system that was used for your base VM.

vm4
When the summary shows up, choose "Customize settings".

vm5
Save the VM with a sensible name. Since you will be creating many of the same kind, choose a pattern that you can recognize. In this case, I used the base name with an additional letter at the end to identify the server.

vm6
The disk is your main concern. The default settings want to create a 20 GB disk. Simply remove it.

vm7
When asked if you want to get rid of the disk completely, say yes, "move it to the trash".

Now, Very important, before you go to the next step. Use the command line, or a GUI, and copy the index .vmdk file from the base directory to the new virtual machine. I call it nd.vmdk (nd=new disk) but you can call it whatever you want. make sure that this file is not write protected.
vm8
Add another disk. Here's the first trick part. By default, Fusion wants to recreate the same disk that you have just removed. But notice that there is a drop down menu on the right side of the panel. Click on "choose existing disk".

vm9
Here you select the file that you have copied from the base directory. And now another tricky point. Make sure that you click on "Share this virtual disk ...", otherwise VMWare will make a copy of your original disk files.

vm11
You can now change other parameters in the virtual machine, such as the amount of RAM and how many processors you want to use for it. I recommend unchecking the "connected" box next to the CDROM, to avoid a warning when more than a cloned VM work at the same time. You can always enable the CD later if you need it.

vm12
Now Don't switch on your cloned virtual machine just yet. If you do, you get an incomprehensible message, where the VM complains about not being able to access the new disk file, while in reality it can't access the disks referred by that index file. This is where the precaution of write protecting the files comes handy. If you hadn't done that, the VM would access (and eventually modify) the virtual disk files, and possibly corrupt them. Instead, you need another step before having a functional clone.

vm13
You need to create a snapshot. Once you have done that, the VM will write to the snapshot files all the deltas between your read-only disks and your VM final status.

vm14
You can call the snapshot whatever you like.

vm15
Finally, you can run the virtual machine. If other virtual machines are running, it may warn you that some devices might not be available. Ignore this warning unless you know for sure that there is a unique resource that should not be shared (usually, there isn't).

Your virtual machine is ready to run. If you need to create three identical servers to simulate a cluster, and the original VM has 4GB of occupied storage, the operation won't cost you 16 GB, but just a few dozen MB:
$ du -sh ub_*/
 21M ub_srv_10_10a.vmwarevm/
 22M ub_srv_10_10b.vmwarevm/
 21M ub_srv_10_10c.vmwarevm/
3.9G ub_srv_10_base/

Part II - Use your virtual machines from the command line

Although VMWare Fusion is handy for creating virtual machines, and for using GUI-based operating systems, it is less than desirable when you have several virtual machines with only text interface, and you want to use them from the command line, the same way you would do in most real life administration or QA operations.

No need for further hacking, in this case. You can manage your virtual machines from the command line, using an utility called vmrun, which is located under /Library/Application Support/VMware Fusion/.
You can read the full manual in a lengthy PDF document that is available online ( Using vmrun to Control Virtual Machines), but here's the short story.
vmrun can start and stop a virtual machine. You just need to use it when the VMWare Fusion application is closed. For example:
$ vmrun start $HOME/vmware/vm/ub_srv_10_10a.vmwarevm/ub_srv_10_10a.vmx nogui
2010-11-19 14:58:49 no printer configured or none available
2010-11-19 14:58:49 adaptor daemon booted
2010-11-19 14:58:49 connector "vmlocal" booted

As part of my preparation of the virtual machine, I created a simple script that runs ifconfig and filters out the virtual machine IP to a file in the user's home.
Using this information, I can then run the following commands:
$ vmrun -gu qa -gp SECRETPWD runProgramInGuest \
   $HOME/vmware/vm/ub_srv_10_10a.vmwarevm/ub_srv_10_10a.vmx \
   /home/qa/bin/get_ip

$ vmrun -gu qa -gp SECRETPWD copyFileFromGuestToHost \
   $HOME/vmware/vm/ub_srv_10_10a.vmwarevm/ub_srv_10_10a.vmx \
   /home/qa/myip ./server_a_ip

$ cat server_a_ip 
192.168.235.144

This is just a simple example of what you can do. Once you have the IP address (which the DHCP server could change), you can connect to your virtual machine via ssh and do what you need. When you have finished, you can switch off the VM, again without need of using the GUI:
$ vmrun stop ~/vmware/vm/ub_srv_10_10a.vmwarevm/ub_srv_10_10a.vmx 2010-11-19 15:11:25 adaptor daemon shut down
2010-11-19 15:11:25 connector "vmlocal" shut down

Happy (cloud) hacking!

Friday, November 12, 2010

Filtering binary logs with MySQL Sandbox and replication

A few days ago, a friend of mine asked me if I knew of a way of filtering a bunch of binary logs, to extract only statements related to a single table. The task was about filtering a few hundred binary log files.

It's a tricky problem. Even with my experience with regular expressions, I knew that using a script to extract statements related to a single table was going to be a nasty business.
However, I know of an entity that can filter by table name efficiently, and that's the MySQL replication system. So I suggested using replication to a sandbox with a replicate-wild-do-table statement to get the job done.
My friend was skeptical and did not want to go that way. I was busy writing an article for an Italian magazine and did not follow up immediately. But today, with the article safely in the editor's hands, I did a quick test, and guess what? It works!
binary log filter
Here is a step-by-step procedure to do it. I started with a server built with MySQL Sandbox, using MySQL 5.5.7. I used the employees test database to create a large enough binlog, and soon I had a database containing 160 MB of data and a binary log of about the same size.
Then I decided that I wanted to filter the binlog, to get only statements about the employees table. Thus, I issued this command:

$ make_sandbox 5.5.7 --sandbox_port=6000 \
  --sandbox_directory=trans_repl \
  -c log-slave-update \
  -c replicate-wild-do-table=employees.employees \
  -c log-bin=mysql-bin \
  -c server-id=101
The "-c" option transfers its argument to the sandbox configuration file.
At the end of this operation, I had one server with the same version of the server that I had filled with the employee database. The server is ready to filter replicated streams, accepting only commands that affect the table 'employees' within the database 'employees'.

The second step was to create an empty database in the second server, with the Innodb tables converted to BlackHole (to avoid wasting unnecessary space).

Inside the first sandbox, I did this:

$ ./my sqldump -B --no-data employees \
  | perl -pe 's/=innodb/=blackhole/i' \
  | ~/sandboxes/trans_repl/use
Combining the flexibility of the sandbox with some command line skills, the operation requires just one command.
Before starting the replication, I needed to avoid re-creating the tables, or my blackhole trick would have been useless. So I looked at the binary log, and found where the CREATE TABLE statements ended:

$ ./my sqlbinlog ./data/mysql-bin.000001 |less
BEGIN
/*!*/;
# at 3057
#101112 9:48:45 server id 1 end_log_pos 3364 Query thread_id=1 exec_time=0 error_code=0
SET TIMESTAMP=1289551725/*!*/;
INSERT INTO `departments` VALUES ('d001','Marketing'),('d002','Finance'),('d003','Human Resources'),('d004','Production'),('d005','Development'),('d006','Quality Management'),('d007','Sales'),('d008','Research'),('d009','Customer Service')/*!*/;

Armed with this knowledge, I logged in the second server and did the following:

FLUSH BINARY LOGS; # to get a binlog with only the statements that I need

CHANGE MASTER TO
master_host='127.0.0.1',
master_port=5570,
master_user='msandbox',
master_password='msandbox',
master_log_file='mysql-bin.000001',
master_log_pos=3057;    # this is the position after all the CREATE TABLE
                        # statements in the master
START SLAVE;
After a few seconds, I issued a "SHOW SLAVE STATUS". All was OK.
I flushed the logs again and inspected the second binary log. As expected, it contained only the statements related to the employees table.
Total cost of the operation: 5 minutes. Way quicker than writing this report!

Thursday, November 11, 2010

Speaking at the UK Oracle User Group conference - Birmingham

UKOUG Conference Back to the conference circuit after some rest.
On December 1st I will be speaking under my new affiliation at Continuent in the MySQL track at the UKOUG conference. My topic is MySQL - Features for the enterprise and it will basically cover the main features of MySQL 5.5.
This conference is the largest Oracle related event in Europe, and it is organized by users for other users. This year for the first time the conference hosts a MySQL dedicated track.
It is a sort of epidemic. Most of the important Oracle events have now some MySQL content, or a full track. I see this as a great opportunity for both classic MySQL and Oracle users to meet each other and find common business ground for the future.

Thursday, November 04, 2010

Testing MySQL 5.5 semi-synchronous replication

A few days ago I saw an article about Semi-Synchronous Replication in MySQL 5.5. It asks questions, and doesn't give answers beyond gut feeling. So I thought that I would do some practical testing of this new feature.
Before we go that way, though, let's revisit the theory.

How semi-synchronous replication works


Figure 1. A transaction with regular replication
With regular replication, you send a transaction to the master (1). When the COMMIT is received, the master executes it (2), and if successful it logs the transaction to the binary log (3). The the master answers the client request (4) with a successful result. In the meantime, the slaves replicate the record (5).
What happens if the master crashes after point #4 and before a slave has had a chance of getting the data in point #5?
The client will have a result for that transaction, but that data is lost, because it has never reached one slave.


Figure 2. A transaction with semi-synchronous replication

Let's see the same scenario with semi-synchronous replication. All is the same until point #3. Then, things change. The master does not return to the client. Instead, it alerts the slave that a transaction is available (4). The slave gets it and stores it to the relay log (5). Without further action, the slave tells the master that the transaction was received (6) and only then the master returns the result to the client (7).
What is the benefit? If the master crashes, the transaction is lost. but the client does not get the false information that the transaction was stored. Instead, if the master crashes before it returns the result to the client, the client gets an error, and knows that that transaction needs to be reworked when a new master is back.

Semi-synchronous replication in practice

Now, the practicalities.
How do you tell if semi-synchronous replication is working? If you leave the default timeout of 10 seconds, you have an immediate clue that something is wrong when a query takes too long to return. Investigation is possible by looking at the GLOBAL STATUS variables.
Rpl_semi_sync_master_status tells you if the master is ready for semi-synchronous replication.
Rpl_semi_sync_master_yes_tx is the number of positive transactions that were delivered using semi-synchronous replication.
Rpl_semi_sync_master_no_tx is the number of failed attempts at delivering a transaction via semi-synchronous replication. When that happens, Rpl_semi_sync_master_status becomes "OFF", and you need investigating.

The important thing to understand about this feature is that semi-synchronous replication does not guarantee that your transaction is executed in the slave. It will only tell you that the data has been transferred to the slave relay log. It can still happen that the transaction fails to execute on the slave (which could be either a bug in your application or a bug in MySQL replication). But this is not a cluster. Don't expect a two-phase commit.

Testing semi-synchronous replication

If you want to test this feature without suffering too much, you can use a tarball binary and MySQL Sandbox. Once you have installed MySQL Sandbox and have downloaded the server tarball, you can install a test replication system with
make_replication_sandbox --how_many_slaves=4 /path/to/mysql-5.5.6-yourOS.tar.gz
This will create a system with 1 master and 4 slaves.
The Sandbox has a shortcut to install the plugin quickly and painlessly:
sbtool -o plugin --plugin=semisynch -s $HOME/sandboxes/rsandbox_5_5_6
Now you will have the semi-synchronous plugin installed in the master and all the slaves. For our tests, we will make a shell script, an easy task thanks to the sandbox utilities.

#!/bin/sh

echo "disabling semi-synch replication in all slaves except 1"
./s1 -e 'set global rpl_semi_sync_slave_enabled=1'
./s1 -e 'slave stop io_thread; slave start'

for E in 2 3 4
do
    ./s$E -e 'set global rpl_semi_sync_slave_enabled=0'
    ./s$E -e 'slave stop io_thread; slave start'
done

#
# this query will show the main variables that tell
# if semi-synch replication is working
#
Q1='select variable_name, variable_value'
Q2='from information_schema.global_status'
Q3='where variable_name in'
Q4='("RPL_SEMI_SYNC_MASTER_YES_TX", "RPL_SEMI_SYNC_MASTER_NO_TX")'
I_S_Q="$Q1 $Q2 $Q3 $Q4"

echo ""
echo "creating a table. it should replicate through the semi-synch"
./m -vvv -e 'create table test.t1( i int)'
./m -e "$I_S_Q"

echo ""
echo "inserting a record. The number of 'YES' should increase"
./m -e 'insert into test.t1 values (1)'
./m -e "$I_S_Q"

echo ""
echo "disabling semi-synch replication in slave 1"
./s1 -e 'set global rpl_semi_sync_slave_enabled=0'
./s1 -e 'slave stop io_thread; slave start'

echo ""
echo "enabling semi-synch replication in slave 3"
./s3 -e 'set global rpl_semi_sync_slave_enabled=1'
./s3 -e 'slave stop io_thread; slave start'

echo ""
echo "inserting a record. The number of 'YES' should increase"
./m -e 'insert into test.t1 values (2)'
./m -e "$I_S_Q"

echo ""
echo "disabling semi-synch replication in slave 3"
./s3 -e 'set global rpl_semi_sync_slave_enabled=0'
./s3 -e 'slave stop io_thread; slave start'

echo ""
echo "inserting a record. The number of 'NO' should increase"
./m -vvv -e 'insert into test.t1 values (3)'
./m -e "$I_S_Q"

echo ""
echo "enabling semi-synch replication in slave 2"
./s2 -e 'set global rpl_semi_sync_slave_enabled=1'
./s2 -e 'slave stop io_thread; slave start'

echo ""
echo "inserting a record. The number of 'YES' should increase"
./m -e 'insert into test.t1 values (4)'
./m -e "$I_S_Q"
This script will first disable semi-synchronous replication in all the slaves except one. Then it will create a table, and check for the telling status variables.
This should work quickly and without problems. Then it will disable the plugin on the only slave that was active, and enable another slave instead.
Inserting a record on the master will work again quickly, as the newly enabled slave will get the record immediately.
Then the slave gets disabled, and we can witness what happens. The query takes a bit longer than 10 seconds, and the status variable tells us that semi-synchronous replication has failed.
We finally enable yet another slave, and when we try a further insertion, we can see that the semi-synchronous replication has resumed.

Very important:
To enable or disable semi-synchronous replication on a slave it is not enough to set the appropriate variable. You need also to restart the slave by issuing a STOP SLAVE IO_THREAD followed by a START SLAVE commands.

Here is a sample run:
disabling semi-synch replication in all slaves except 1

creating a table. it should replicate through the semi-synch
--------------
create table test.t1( i int)
--------------

Query OK, 0 rows affected (0.87 sec)

Bye
+-----------------------------+----------------+
| variable_name               | variable_value |
+-----------------------------+----------------+
| RPL_SEMI_SYNC_MASTER_NO_TX  | 0              |
| RPL_SEMI_SYNC_MASTER_YES_TX | 1              |
+-----------------------------+----------------+

inserting a record. The number of 'YES' should increase
+-----------------------------+----------------+
| variable_name               | variable_value |
+-----------------------------+----------------+
| RPL_SEMI_SYNC_MASTER_NO_TX  | 0              |
| RPL_SEMI_SYNC_MASTER_YES_TX | 2              |
+-----------------------------+----------------+

disabling semi-synch replication in slave 1

enabling semi-synch replication in slave 3

inserting a record. The number of 'YES' should increase
+-----------------------------+----------------+
| variable_name               | variable_value |
+-----------------------------+----------------+
| RPL_SEMI_SYNC_MASTER_NO_TX  | 0              |
| RPL_SEMI_SYNC_MASTER_YES_TX | 3              |
+-----------------------------+----------------+

disabling semi-synch replication in slave 3

inserting a record. The number of 'NO' should increase
--------------
insert into test.t1 values (3)
--------------

Query OK, 1 row affected (10.12 sec)

Bye
+-----------------------------+----------------+
| variable_name               | variable_value |
+-----------------------------+----------------+
| RPL_SEMI_SYNC_MASTER_NO_TX  | 1              |
| RPL_SEMI_SYNC_MASTER_YES_TX | 3              |
+-----------------------------+----------------+

enabling semi-synch replication in slave 2

inserting a record. The number of 'YES' should increase
+-----------------------------+----------------+
| variable_name               | variable_value |
+-----------------------------+----------------+
| RPL_SEMI_SYNC_MASTER_NO_TX  | 1              |
| RPL_SEMI_SYNC_MASTER_YES_TX | 4              |
+-----------------------------+----------------+
Using the above steps, you should be able to use semi-synchronous replication and do some basic monitoring to make sure that it works as expected.

Tuesday, November 02, 2010

QA at Continuent. A serendipitous job.

Unexpected road A few days ago, I left the MySQL team at Oracle, after more than 4 years of work dedicated mostly to the MySQL community.
Someone will probably remember that, when I joined MySQL in 2006, I started my work in the QA team. It was no coincidence. My previous work as a consultant was very much focused on database development quality, even when my customers had approached me for different reasons.
Let's be frank. I am a minority. It's not common to find someone who is passionate about QA. I am aware of being a rare bird, who likes testing and bug searching, and doing all the little steps that all together improve the overall quality of a software system.
I had been thinking about my passion lately, and I often caught myself thinking why in my job at MySQL I let things slip away so much from my ambition of tackling some hard technical problems.
I was more or less in this frame of mind when I met by chance an old acquaintance, Robert Hodges, CEO of Continuent and a fine technologist whom I respect and admire. And I was astonished to hear him asking me if I knew a good candidate for a position of QA manager for Continuent.
You can guess the rest of the story. I told him that I indeed had a candidate, and the next day we went to dinner together, and talked the most hackish and refreshing talk that I have had in years with Ed Archibald, the CTO (oh, and the steaks were great too!)
A few days after that, Robert and Ed organized a demo of Continuent technology for me, and I was hooked.

There was, indeed, an uncommon problem to solve. I had received offers for very good positions at three other companies, and one more had asked me if I wanted to consider working for them. As all the offering parts are friends, this fact did put me in a difficult position. Every offer was very intriguing and exciting, and in the end I had to take a choice, leaving one happy and three disappointed friends in the field. After careful choice, I chose Continuent, where I will start as Director of Quality Assurance on November 15th.
Fortunately for me, the three other friends are real friends, and they took the refusal graciously. They knew from the beginning that they had competition, and they knew that they could win or lose for reasons that they could not fully control. And indeed the main reason for choosing Continuent is the challenge of developing a beautiful technology that will eventually take the world by storm. (Or so it's the general plan. It's too early to make claims, as I haven't started getting acquainted with the technology yet, but it feels good).

What now? I have some tentative plans that I need to consolidate in the near future:
  • Taking some vacation. I am already one day into this first part;
  • Getting familiar with the technology that I will help improving;
  • Getting acquainted with my new colleagues, for which I will have a splendid occasion at the company meeting that will happen soon in Stuttgart;
  • Learning some new technology, both in development and in a more strict QA field;
  • Setting up my new laptop, which will hopefully come before I start the new job;
  • Talking at conferences. I have already committed to talking at the UKOUG in Birmingham, on December 1st, and more will come.
  • Resuming my blogging at full speed. I have a mental drawer full of ideas, and I only need to transfer them to the keyboard.
And probably I will also start playing with the new technologies that I have been reading about in the last months but I haven't had time to pursue. Either way, I feel refreshed and ready for the next battle.
Thanks, Continuent, for this great opportunity!

Monday, November 01, 2010

Book review: MySQL 5.1 plugin development

MySQL 5.1 Plugin Development MySQL 5.1 Plugin Development,
by Sergei Golubchik and Andrew Hutchings.
Packt Publishing, 2010.
Executive summary: Highly recommended. If you want to develop MySQL extensions, buy this book. It's a must, written by two expert professionals who probably know more than anyone else on this matter. The book is full of practical examples explained with the theoretical information necessary to make it stick.

This book fills a gap in the world of MySQL documentation. Although the MySQL docs are extensive and thorough, to the point that sometimes you wished that the writers were less verbose and stick more to the topic, when it comes to the plugin architecture, there are still more some unanswered questions. I guess that the ones who have implemented MySQL extensions so far have read the code, more than the documentation.
But, back to the point. The void is filled now, and in such a wonderful way! I have to disclose that I was the reviewer of this book, and while this fact puts me in a conflicting position when it comes to a review, it has also given me inner reason for praise, beyond the experience of a regular reader. The people who have worked with me will know that I am hard to please, and so my review was peppered with suggestions for improvements that, I admit it, made the authors' life quite difficult. I mostly complained that some of the examples proposed for each chapter were not enough useful and interesting. In short, there was not enough incentive for the reader to start coding immediately. I am glad to report that the authors were very responsive, and, rather than being annoyed by my demands, did work enthusiastically at the examples in the book until they became a true gem in this category. This book will surprise you for its practical approach, when it guides you through the examples, until you end up with a working extension.
The first chapter is probably the part that will save many hours of attempts to everyone who wants to build a plugin. In that chapter the authors explain how to build the different types of plugins in several operating systems. If you have ever tried to build a plugin, you will appreciate this chapter.
The book's body covers every type of plugin that you can create in MySQL, from the classic UDF to the more recent storage engine. For each chapter, you will have code ready to compile and try on your own, and the line-by-line explanation of what that code does, and why it was written the way it is.
Whenever the code is less than intuitive, the authors take care of it with additional theoretical and practical explanation, making sure that the reader is fully informed on the topic, and can complete the task effectively.
Summing up, this is not a book for the idle reader. It's a a practical help for the doers, the ones who want to get the job done, and need guidance through the maze of the MySQL plugin protocol.

Update. There is also a review at Slashdot.

Saturday, October 09, 2010

My wish list for OpenSQLCamp, Boston 2010

OpenSQL Camp Boston 2010 in Cambridge, MA
On Friday, October 15th, the third OpenSQLCamp in North America will start at 6pm with a social event and then continue with more technical stuff over the week end.
Compared to the overcrowded Oracle Open World with its 45,000 attendees, this event may look like a desert. But the quality of the attendance will more than compensate for the missing thousands.
The event will be held with unconference style. There is a list of sessions, where I have proposed two topics, one for which I would like to see someone stepping up and help defining the topic of Data integrity : comparing the SQL and NoSQL approach and one where I plan to tell an cute technical story about MySQL replication and the quest for a global transaction ID.

Thursday, September 30, 2010

A funny recipe.

According to a recent book about MySQL, this is the recipe to convert an IP address into an integer, with the purpose of using it as server-ID.
  1. Take for example 10.0.159.22
  2. Using a calculator (!!!), convert each of the four numbers to hexadecimal (you get 0a.00.9f.16)
  3. Then glue the four hexadecimal numbers together, after removing the dots, and, again with the calculator, convert them to decimal (0a009f16HEX=167812886DEC)
  4. Use this number (167812886) as your server ID in the options file.
Brilliant, eh?

Had the authors searched the MySQL manual for "IP address", they would have found the INET_ATON function, which can be used like this:
select inet_aton('10.0.159.22');
+--------------------------+
| inet_aton('10.0.159.22') |
+--------------------------+
|                167812886 |
+--------------------------+

Update.
Of course, if you want to ignore INET_ATON and avoid using a calculator, you can still leverage the server to do something like this:

delimiter //
drop function if exists redundant_inet_aton //
create function redundant_inet_aton(IP varchar(16))
returns bigint
deterministic
begin
    return conv( 
      concat(
        lpad(hex(substring_index(IP, '.', 1)+0),2,'0'),
        lpad(hex(substring_index(
                    substring_index(IP, '.', 2),
                 '.', -1) +0 ),2,'0'), 
        lpad(hex(substring_index(
                    substring_index(IP, '.', 3),
                 '.', -1) +0 ),2,'0'),
        lpad(hex(substring_index(IP, '.', -1) +0),2,'0') 
      ), 
        16,10);
end //
#
# Probably good for an obfuscated code contest
#
delimiter ;

#
# testing the new function
#
select IP, redundant_inet_aton(IP), inet_aton(IP)
    from (
               SELECT '10.0.159.22' AS IP 
        UNION  SELECT '192.168.2.6' 
        UNION  SELECT '127.0.0.1' 
        UNION  SELECT '10.176.1.15' 
        UNION  SELECT '173.194.36.104' 
    ) as t;
+----------------+-------------------------+---------------+
| IP             | redundant_inet_aton(IP) | inet_aton(IP) |
+----------------+-------------------------+---------------+
| 10.0.159.22    |               167812886 |     167812886 |
| 192.168.2.6    |              3232236038 |    3232236038 |
| 127.0.0.1      |              2130706433 |    2130706433 |
| 10.176.1.15    |               179306767 |     179306767 |
| 173.194.36.104 |              2915181672 |    2915181672 |
+----------------+-------------------------+---------------+

Thursday, September 23, 2010

Book review : SQL Antipatterns

SQL Antipatterns, by Bill Karwin


I remember that when I finished reading The Lord Of The Rings, I felt a pang of disappointment. "What? Already finished? What am I going to read now? What can give me the same pleasure and sense of accomplishment that these wonderful pages have given me?"
That's how I felt when I came to the last page of SQL Antipatterns. And, no, Bill Karwin doesn't tell imaginary tales from a fictitious world. This book is full of very real and very practical advice, but all the material is presented with such grace and verve that I could not put it down until the very end. I read it cover to cover in just a few hours, and I savored every page.

What is this Antipatterns, anyway? The title may deceive a casual bookshop browser into believing that it's about some philosophical database theory. Digging further, you realize that it's a collection of best practice in database handling, but told from the side of the bad practice. Every chapter starts with the story of a disaster, followed by an explanation of why it happened, and how to recognize the symptoms. Along the way, you learn what people do to fix the problem when they don't understand where exactly the trouble comes from. And then comes the best practice section, where you are told of several ways of fixing the problem for good.
It's a pleasure from the start. When I read the introductory stories, I recognized mistakes that I have witnessed in my past experience and some that I made myself. It was a fascinating sequence of "look at this mess!", "Here is what you shouldn't do", "It happens when ...", and "this is what you should do".
The more I read, the more I wanted to read. And after a few chapters, a pattern appeared among the Antipatterns. The stories are not a casual collection of database mistakes. There is a thread. Most of the stories tell you what happens when you violate the relational theory principles, and the best practice solutions focus on how to apply relational principles in real life.
The relationship between the horror stories and the relational database theory becomes apparent when you get to the appendix. In other books, you can often skip the appendixes as a reminder of something that you know already. Not here. If you skip this one, you will miss a sort of Ellery Queen ending, where the detective puts together all the pieces of the mystery for the amazed audience.

I feel I would do a disservice to Bill if I revealed more about the book. I may have already spoiled some of the surprise by disclosing the inner structure of the book. But certainly I won't give away any of the juicy stories that make this book an invaluable tool for every database professional.
I can only say this. When I read a technical book, I usually find something wrong, or debatable, or some technical solution that I could improve upon. Not in this book. I completely agree with the theory and the practice that is suggested from start to finish. My QA ego suffers at the idea of not having found anything wrong, but this slight disappointment is more than compensated by the pleasure of reading such a beautiful book. Well done, Bill! I am already queuing to buy your next one!

Monday, July 12, 2010

OpenSQLCamp EU 2010 - Vote your favorite sessions

The deadline for the OpenSQLCamp CfP is over. Now it's time to submit your votes.
As we did last year, the procedure is public and transparent. After seeing the list of submitted sessions, you can then vote via mailing list, or via Twitter.
But, please, hurry! We need to finalize the schedule at the end of this week.
Thanks!

Monday, July 05, 2010

More on the open core : the pragmatic view

Open to the core I joined the number of those who have a public opinion on the open core debate.
Roberto Galoppini has graciously accepted to host a post on this topic in his Commercial Open Source Software blog.
Please read it directly from there:
Open to the core - The pragmatic freedom
Enjoy!

Speaking in the US - San Francisco User Group - Community Summit - OSCON

Giuseppe in US On July 15th and 16th I will be in San Francisco for a few meetings, and it will be my pleasure to meet the San Francisco MySQL User Group, where I will talk about MySQL Sandbox.
From there, I will continue to Portland, to attend the Community Leadership Forum and, of course OSCON, where I will have three talks: two in the main event, and one at the Intel booth.

OSCON 2010

Tuesday, June 29, 2010

OpenSQLCamp EU 2010 - Last days to submit a talk

There is still time and hope if you want to present at the OpenSQLCamp 2010, European edition. The Call for Participation is open until July 11th.
Thanks for the ones who have submitted proposals.
A few caveats: if your proposal does not include a description, it will not be accepted. Therefore, wannabe speakers, please check your proposals, and make them as good as you can!
We want talk about all open source database. Not only MySQL.
Open source database fans, wake up, and submit a proposal!

Monday, June 28, 2010

MySQL Sandbox embraces Python and meets Cluster

If you have tried Quick start guides: MySQL cluster in 10 minutes, you may have realized that it is really quick and easy.
However, it leaves some typing to be done.
Users of MySQL Sandbox have a horror of repetitive typing, and this got me thinking. "Could I integrate MySQL Sandbox and Cluster?"
The answer was: "Sure."
But then I started thinking of all the minor and major changes that I wanted to do to the Sandbox and have delayed for too long. What I need, is a radical refactoring.
And then I remembered that it has been almost two years since I learned a new programming language and that perhaps I could expand my horizons and the Sandbox architecture at once.
Thus, thanks to an irresistible offer from O'reilly about ebooks, last week I bought both Learning Python, fourth edition and Programming Python, Third edition.
During the week end I produced my first tool: a Python script that installs and starts a small cluster, following the instructions given in the MySQL Cluster quick start guides. The script unpacks the cluster tarball, installs a server sandbox from it, then starts the cluster nodes and the MySQL server, and then it monitors the cluster until all the nodes are connected before finally testing the server.
Here is a sample run:

$ make_cluster.py ~/Downloads/mysql-cluster-gpl-7.1.4b-osx10.6-x86_64.tar.gz

++ tar -xzf ~/Downloads/mysql-cluster-gpl-7.1.4b-osx10.6-x86_64.tar.gz

++ low_level_make_sandbox --basedir=$HOME/python/mysql-cluster-gpl-7.1.4b-osx10.6-x86_64 --sandbox_directory=mcluster --install_version=5.1 --sandbox_port=5144 --no_ver_after_name --no_run --force --my_clause=log-error=msandbox.err --my_clause=ndbcluster 
    The MySQL Sandbox,  version 3.0.12
    (C) 2006-2010 Giuseppe Maxia
installing with the following parameters:
upper_directory                = $HOME/sandboxes
sandbox_directory              = mcluster
sandbox_port                   = 5144
check_port                     = 
no_check_port                  = 
datadir_from                   = script
install_version                = 5.1
basedir                        = $HOME/python/mysql-cluster-gpl-7.1.4b-osx10.6-x86_64
tmpdir                         = 
my_file                        = 
operating_system_user          = gmax
db_user                        = msandbox
db_password                    = msandbox
my_clause                      = log-error=msandbox.err ; ndbcluster
prompt_prefix                  = mysql
prompt_body                    =  [\h] {\u} (\d) > 
force                          = 1
no_ver_after_name              = 1
verbose                        = 
load_grants                    = 1
no_load_grants                 = 
no_run                         = 1
no_show                        = 
loading grants
.. sandbox server started
stopping server
Your sandbox server was installed in $HOME/sandboxes/mcluster

++ mysql-cluster-gpl-7.1.4b-osx10.6-x86_64/bin/ndb_mgmd -f $HOME/sandboxes/mcluster/my_cluster/conf/config.ini --initial --configdir=$HOME/sandboxes/mcluster/my_cluster/conf/
2010-06-28 21:29:57 [MgmtSrvr] INFO     -- NDB Cluster Management Server. mysql-5.1.44 ndb-7.1.4b
2010-06-28 21:29:57 [MgmtSrvr] INFO     -- Reading cluster configuration from '$HOME/sandboxes/mcluster/my_cluster/conf/config.ini'

++ mysql-cluster-gpl-7.1.4b-osx10.6-x86_64/bin/ndbd -c localhost:1186
2010-06-28 21:29:57 [ndbd] INFO     -- Configuration fetched from 'localhost:1186', generation: 1

++ mysql-cluster-gpl-7.1.4b-osx10.6-x86_64/bin/ndbd -c localhost:1186
2010-06-28 21:29:57 [ndbd] INFO     -- Configuration fetched from 'localhost:1186', generation: 1

++ mysql-cluster-gpl-7.1.4b-osx10.6-x86_64/bin/ndb_mgm -e show
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 (not connected, accepting connect from localhost)
id=4 (not connected, accepting connect from localhost)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @localhost  (mysql-5.1.44 ndb-7.1.4)

[mysqld(API)] 1 node(s)
id=50 (not connected, accepting connect from any host)


++ $HOME/sandboxes/mcluster/clear

++ $HOME/sandboxes/mcluster/start
... sandbox server started
Please wait. Giving the cluster time to catch up
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, starting, Nodegroup: 0, Master)
id=4 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, starting, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @localhost  (mysql-5.1.44 ndb-7.1.4)

[mysqld(API)] 1 node(s)
id=50 (not connected, accepting connect from any host)


It may take up to 2 minutes to initialize ... ( 0 )
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, starting, Nodegroup: 0, Master)
id=4 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, starting, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @localhost  (mysql-5.1.44 ndb-7.1.4)

[mysqld(API)] 1 node(s)
id=50 (not connected, accepting connect from any host)


It may take up to 2 minutes to initialize ... ( 5 )
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, starting, Nodegroup: 0, Master)
id=4 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, starting, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @localhost  (mysql-5.1.44 ndb-7.1.4)

[mysqld(API)] 1 node(s)
id=50 (not connected, accepting connect from any host)

[...]

It may take up to 2 minutes to initialize ... ( 100 )
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)] 2 node(s)
id=3 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0, Master)
id=4 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4, Nodegroup: 0)

[ndb_mgmd(MGM)] 1 node(s)
id=1 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4)

[mysqld(API)] 1 node(s)
id=50 @127.0.0.1  (mysql-5.1.44 ndb-7.1.4)


++ $HOME/sandboxes/mcluster/use -vvv -e "create table test.t1(i int not null primary key)engine=ndb"
--------------
create table test.t1(i int not null primary key)engine=ndb
--------------
Query OK, 0 rows affected (0.45 sec)

++ $HOME/sandboxes/mcluster/use -vvv -e "show create table test.t1\G"
--------------
show create table test.t1
--------------

*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `i` int(11) NOT NULL,
  PRIMARY KEY (`i`)
) ENGINE=ndbcluster DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

to shut down the cluster, type:
$HOME/python/mysql-cluster-gpl-7.1.4b-osx10.6-x86_64/bin/ndb_mgm -e shutdown
It works! Ans this is a good start to make me feel confident with Python, which I will use to develop MySQL Sandbox version 4. This cluster snippet will probably be made from scratch once the new architecture is in place. For now, it was enough to get the feeling of the language.

Friday, June 25, 2010

Quick start guides: MySQL cluster in 10 minutes

MySQL Cluster quick start guide Scared of MySQL Cluster?
Don't be. You may want to try the quick start guides that are available in the Cluster downloads page.
These guides are a step-by-step instructions set to create a simple cluster in one host.
Following the instructions, you will be able to shape up all the cluster components in 10 minutes or less. This will be far from production ready, of course, but it shows that cluster is not rocket science, and anyone can get started with it with a minimal time investment.
I tried the Linux instructions on my Mac, and it worked without need for any changes. Things may be different when you deploy a real set of servers on separate hosts, but it's a good start.
If I compare this guide with my first experience with MySQL Cluster in 2004, usability and stability have increased enormously.

Saturday, June 19, 2010

Welcome googleCL

I am writing this blog post with Vim, my favorite editor, instead of using the online editor offered by blogger. And I am uploading this post to my Blogger account using Google CL a tool that lets you use Google services from the command line.
I am a command line geek, and as soon as I saw the announcement, I installed it in my laptop. The mere fact that you are reading this blog post shows that it works.

GoogleCL is an apparently simple application. If you install it on Mac using macports you realize how many dependencies it has and how much complexity it gives under the hood.
Using an easy to understand syntax, it allows you to access your blog, pictures, calendar, contacts, videos, and online documents at your fingertips.
For example, let's query my blog for partitioning:

$ google blogger --blog="The Data Charmer" --title=partitioning list "title,url"

Hmm. No results. The manual doesn't help much, but something happened during this query. The first thing ist that I was asked to authorize the script to access my blog, and that was done by activating a key that I got in the command line. So far, so good. The second thing was a message informing me that a default configuration file was created in my home directory. Looking at that file, I saw an option saying "regex = True". Aha! So the title supports regular expressions. Let's try:

$ google blogger --blog="The Data Charmer" --title=".*partitioning" list "title"
Holiday gift - A deep look at MySQL 5.5 partitioning enhancements
The partition helper - Improving usability with MySQL 5.1 partitioning
A quick usability hack with partitioning
MySQL 5.1 Improving ARCHIVE performance with partitioning

OK. This gives me everything with the word "partitioning" in the title. But I know that some titles are missing. Comparing with the results that I get online, I see that the titles where "partitioning" is capitalized are not reported. So the search is case sensitive. What I need to do is to tell the regular expression that I want a case insensitive search. Fortunately, I know how to speak regular expressions. Let's try again.

$ google blogger --blog="The Data Charmer" --title="(?i).*partitioning.*" list "title"
Holiday gift - A deep look at MySQL 5.5 partitioning enhancements
Partitioning with non integer values using triggers
Tutorial on Partitioning at the MySQL Users Conference 2009
The partition helper - Improving usability with MySQL 5.1 partitioning
A quick usability hack with partitioning
MySQL 5.1 Improving ARCHIVE performance with partitioning

Now I feel confident enough to do some changes to my online contents.
To create this blog post, I used some of googlecl capabilities. After I created an image, I uploaded it to my Picasa album using this command:

$google picasa post -n "Blogger Pictures" -t googlecl ~/Desktop/google_cl.png

Then I asked Picasa to give me the URL of the image:

$ google picasa list -n "Blogger Pictures" --query googlecl title,url_direct
google_cl.png,https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEigTW09cp-kY2qB2l0zbiCMfM3LTN-ZbjlrhKH0wlxFgOqodQ_Mf5D8aSlq_w2Hv4A2aJiHgqbQR3J8gNiUmmWgTAAPO4AHvyHDHUx6GbCZBlIZQL7QYO6JBxepsk9XuDx8q3Q8/

And then I inserted that URL in this blog post. Finally, I uploaded the blog post with this command:

google blogger --blog="The Data Charmer" --draft --title "Welcome googleCL" --tags="google,mysql,partitioning,command line,blogging" post ~/blog/welcome_googlecl.html


(Now writing online) And after I checked that the post was looking as I wanted it, I hit the "PUBLISH POST" button.
Welcome, GoogleCL!