The Data Charmer: September 2008

Tuesday, September 30, 2008

A cool idea - Revision engine

Yesterday I saw an announcement in the MySQL internals@ list, about a new storage engine being released. DDengine has created a revision engine, a sort of embedded proxy inside MySQL that keeps track of the changes you do to your data.
The idea is clever. You write to your table, update and delete without any concern, and the revision engine stores your changes in the background.
I wanted to tried the engine on my laptop (Mac OSX), but there was no binary available for this architecture. I contacted the authors and I received prompt assistance until I had the product installed. Unfortunately, it crashes immediately. Well, it's to be expected for a version 0.1.
I then tried on Linux, and also here I received very quick assistance from Peter Benjamin Volk, Project Head at DDengine.
It now works on Linux, although it's very basic at the moment.
The documentation says that there are two ways of using the revision engine. One with embedded changes, which involves a silent alteration of your table structure to include the revision info, and one with two tables, one of which is created behind the scenes.
For example:

create table t1 (
 id int not null primary key,
 c char(10)
) engine=revision comment="InnoDB:DOUBLE";
show tables;
+----------------+
| Tables_in_test |
+----------------+
| t1             |
| t1_revision    |
+----------------+

desc t1;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| id    | int(11)  | NO   | PRI | NULL    |       |
| c     | char(10) | YES  |     | NULL    |       |
+-------+----------+------+-----+---------+-------+

desc t1_revision;
+--------------------+---------------------+------+-----+
| Field              | Type                | Null | Key |
+--------------------+---------------------+------+-----+
| id                 | int(11)             | NO   | PRI | 
| c                  | char(10)            | YES  |     |
| revision_id        | int(10) unsigned    | NO   | PRI |
| revision_timestamp | timestamp           | NO   |     | 
| revision_deleted   | tinyint(3) unsigned | NO   |     |
+--------------------+---------------------+------+-----+

The "InnoDB:DOUBLE" tells the engine to use a second table for revision info.
Operations on the table are transparent.

insert into t1 (id,c) values (1, 'aaa'), (2, 'bbb');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

insert into t1 (id, c) values (3, 'ccc'), (4, 'ddd');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

select * from t1;
+----+------+
| id | c    |
+----+------+
|  1 | aaa  |
|  2 | bbb  |
|  3 | ccc  |
|  4 | ddd  |
+----+------+
4 rows in set (0.00 sec)

select * from t1_revision;
Empty set (0.00 sec)

So far, nothing unusual. Let's try some changes.

update t1 set c ='changed' where id = 3;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

select * from t1;
+----+---------+
| id | c       |
+----+---------+
|  1 | aaa     |
|  2 | bbb     |
|  3 | changed |
|  4 | ddd     |
+----+---------+
4 rows in set (0.00 sec)

show variables like '%revision%';
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| revision_select_mode | current |
+----------------------+---------+
1 row in set (0.00 sec)

select * from t1_revision;
+----+------+-------------+---------------------+------------------+
| id | c    | revision_id | revision_timestamp  | revision_deleted |
+----+------+-------------+---------------------+------------------+
|  3 | ccc  |           1 | 2008-09-30 05:45:49 |                0 |
+----+------+-------------+---------------------+------------------+
1 row in set (0.00 sec)

delete from t1 where id = 2;
Query OK, 1 row affected (0.01 sec)

select * from t1_revision;
+----+------+-------------+---------------------+------------------+
| id | c    | revision_id | revision_timestamp  | revision_deleted |
+----+------+-------------+---------------------+------------------+
|  3 | ccc  |           1 | 2008-09-30 05:45:49 |                0 |
+----+------+-------------+---------------------+------------------+
1 row in set (0.00 sec)

set revision_select_mode = 'deleted';
Query OK, 0 rows affected (0.01 sec)

select * from t1_revision;
+----+------+-------------+---------------------+------------------+
| id | c    | revision_id | revision_timestamp  | revision_deleted |
+----+------+-------------+---------------------+------------------+
|  2 | bbb  |           1 | 2008-09-30 05:47:14 |                1 |
+----+------+-------------+---------------------+------------------+
1 row in set (0.00 sec)

select * from t1;
+----+---------+
| id | c       |
+----+---------+
|  1 | aaa     |
|  3 | changed |
|  4 | ddd     |
+----+---------+
3 rows in set (0.00 sec)

The session variable revision_select_mode changes the revision data that to be shown.
At the moment, there is no revert mechanism (or if there is, it's undocumented), but the idea is cool, and I think that this engine will become very useful.

Monday, September 29, 2008

Will you use row-based replication by default?

MySQL 5.1 introduces row based replication, a way of replicating data that fixes many inconsistencies of the statement based replication, the standard method used by MySQL so far.

The good: row based replication solves some problems when replicating the result of non deterministic functions, such as UUID() or NOW().
The bad: row-based replication may break existing applications, where you count on the quirks of statement based replication to execute conditionally (updates base on @@server_id, for example), and may perform badly on updates applied to very large tables.

Currently, MySQL 5.1.28 has the mixed mode enabled by default. When in this mode, binary logging switches to row-based when there is a risk of breaking replication. Except that there are some cases left out, like Bug#39701 (load_file()).
Mixed mode is supposed to be the best of two worlds. You can run fast updates on large tables and not risking inconsistent results on non-deterministic functions. The recent addition to mysqlbinlog allows an easy decoding of the ugly looking row-based statements.
Would you switch to mixed mode or stay in statement mode? There are requests of reverting the default to statement based binlog format, to maintain compatibility with old applications.
Personally, I would go for the mixed mode. But it depends on your applications and your personal experience. What's your take?

Friday, September 26, 2008

Drizzling MySQL

Have you ever used subqueries with MySQL? It's an addition introduced in version 4.1, and since then neglected. The performance of subqueries in MySQL 4.1 and 5.x is really a sad story.
For example, using the Employees test database, you may try this query:

select
title, from_date, to_date
from titles
where emp_no in
(select emp_no
from employees
where first_name = 'Mary'
      and last_name = 'Sluis'
)

The result, in MySQL 5.1.28 is

+-------+------------+------------+
| title | from_date  | to_date    |
+-------+------------+------------+
| Staff | 1990-01-22 | 1996-11-09 |
+-------+------------+------------+
1 row in set (7.88 sec)

Ugly, uh?
But let's try the same query in MySQL 6.0.8.

select title, from_date, to_date
from titles
where emp_no in (
select emp_no
from employees
where first_name = 'Mary'
and last_name = 'Sluis');
+-------+------------+------------+
| title | from_date  | to_date    |
+-------+------------+------------+
| Staff | 1990-01-22 | 1996-11-09 |
+-------+------------+------------+
1 row in set (0.13 sec)

That's a completely different story! What's happening?
The Optimizer Team has been hard at work, and in version 6.0 we will see a huge improvement. Let's try to explain the result:

explain extended select title, from_date, to_date
from titles
where emp_no in (
select emp_no from employees
where first_name = 'Mary'
and last_name = 'Sluis')\G
*************************** 1. row ***************************
        id: 1
select_type: PRIMARY
     table: employees
      type: ALL
possible_keys: PRIMARY
       key: NULL
   key_len: NULL
       ref: NULL
      rows: 300024
  filtered: 100.00
     Extra: Using where
*************************** 2. row ***************************
        id: 1
select_type: PRIMARY
     table: titles
      type: ref
possible_keys: PRIMARY,emp_no
       key: emp_no
   key_len: 4
       ref: employees.employees.emp_no
      rows: 1
  filtered: 100.00
     Extra:
2 rows in set, 1 warning (0.00 sec)

show warnings\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select `employees`.`titles`.`title` AS
`title`,`employees`.`titles`.`from_date` AS `from_date`,
`employees`.`titles`.`to_date` AS `to_date`
from `employees`.`employees`
join `employees`.`titles`
where ((`employees`.`titles`.`emp_no` = `employees`.`employees`.`emp_no`)
and (`employees`.`employees`.`last_name` = 'Sluis')
and (`employees`.`employees`.`first_name` = 'Mary'))

The optimizer has converted the subquery into a join, with a query plan that is 20 times better than the one used in MySQL 5.1. Try the explain extended command in MySQL 5.1.28, and the resulting query is

select `employees`.`titles`.`title` AS `title`,
`employees`.`titles`.`from_date` AS `from_date`,
`employees`.`titles`.`to_date` AS `to_date`
from `employees`.`titles`
where (`employees`.`titles`.`emp_no`,
(((`employees`.`titles`.`emp_no`)
in employees on PRIMARY
where ((`employees`.`employees`.`last_name` = 'Sluis')
and (`employees`.`employees`.`first_name` = 'Mary')
and ((`employees`.`titles`.`emp_no`) = `employees`.`employees`.`emp_no`)))))

So far, the comparison shows that some subqueries in 6.0 are evaluated much better than in previous versions. But what's the "drizzled" part from the title?
There are two issues, actually.
First, as you may have heard, Drizzle is a lightweight version of MySQL, based on version 6.0 code. If you import the same data and run the previous query in Drizzle, you get the same good performance that I got in 6.0.
Second, it would be nice to have subquery optimization in production. Unfortunately, neither Drizzle nor MySQL 6.0 are GA. They are both alpha. Drizzle has not reached the feature freeze stage yet, and MySQL 6.0 should wait for Falcon, Maria, and the online backup to be ready before being considered stable. So today I made a little experiment. I drizzled MySQL 6.0, by disabling Maria and Falcon, getting binaries that are less than half in size, and skipping a lot of unfixed bugs. If you are interested, I compiled using this script.

#! /bin/sh

path=`dirname $0`
. "$path/SETUP.sh"

extra_flags=" $fast_cflags $max_cflags -g"
extra_configs=" --with-plugins=innobase,archive,blackhole,csv,partition "
extra_configs=" $extra_configs --without-falcon "
extra_configs=" $extra_configs --without-maria-storage-engine "

. "$path/FINISH.sh"

You may ask, what's the usefulness of this stripping down operation?
It's just a small test. I have a private agenda, and I will do something even more dramatic in the coming weeks. In the meantime, please think about this. There is a great feature in 6.0, which could be very nice to use in production. What would you do?
Backport it to MySQL 5.1? Strip down 6.0 leaving only subquery optimization, à la drizzle? I have some ideas, but I would like to hear from you. What would you do?

Tuesday, September 16, 2008

Advanced Bazaar for MySQL developers

Guilhem Bichot has written an excellent article on Advanced Bazaar for MySQL Developers. In addition of showing the most common Bazaar operations for developers, Guilhem shows how to create a new feature and submit it for review to MySQL.

The article is an practical introduction to Bazaar's advanced features. After the basics, magnificently covered by Daniel Fischer a few weeks ago, this article explains how to perform high level development operations with Bazaar.
Bravo Guilhem!

Monday, September 15, 2008

Checking and repairing partitions in MySQL 5.1

Talking with users at OSCON 2008, the biggest fear with new features was table corruption with partitioning. Until recently, this fear was substantiated by bug#20129, which reported inability of repairing corrupted partitions. The initial "fix" for this bug was to disable the offending commands.
After a while, especially when the Support department complained loudly, the bug was reopen and a proper fix implemented. Now you can maintain a partition using ALTER TABLE ... CHECK/ANALYZE/OPTIMIZE/REPAIR PARTITION.
I did some experiment, using MySQL sandbox and the employees test database, which supports partitioning.
You need to download the latest MySQL 5.1 release (5.1.28), where the patch is available.
To load the employees database with partitions, you edit employees_partitioned.sql to choose MyISAM as default engine and then run

$ $HOME/sandboxes/msb_5_1_28/use -t <>

Once we are satisfied that the data is safe, we can start messing with it. To simulate a corruption in MyISAM, the easiest way is to copy the index for a partition to the index of a different one.

$ cd $HOME/sandboxes/msb_5_1_28/
$ ./stop
$ cp data/employees/titles#P#p08.MYI data/employees/titles#P#p09.MYI
$ ./start

Now at least one partition is wrong.

ALTER TABLE titles CHECK PARTITION p09;
+------------------+-------+----------+-----------------------------------------------------+
| Table            | Op    | Msg_type | Msg_text                                            |
+------------------+-------+----------+-----------------------------------------------------+
| employees.titles | check | warning  | Size of datafile is: 768416       Should be: 680072 |
| employees.titles | check | error    | Record-count is not ok; is 28286   Should be: 25274 |
| employees.titles | check | warning  | Found 28286 key parts. Should be: 25274             |
| employees.titles | check | error    | Partition p09 returned error                        |
| employees.titles | check | error    | Corrupt                                             |
+------------------+-------+----------+-----------------------------------------------------+

ALTER TABLE titles repair PARTITION p09;
+------------------+--------+----------+--------------------------------------------+
| Table            | Op     | Msg_type | Msg_text                                   |
+------------------+--------+----------+--------------------------------------------+
| employees.titles | repair | warning  | Number of rows changed from 25274 to 28286 |
| employees.titles | repair | status   | OK                                         |
+------------------+--------+----------+--------------------------------------------+

Run the test suite again, and you'll see that the feature works as advertised. However, what happens if we have more than one corrupted partition? Let's find out.

$ ./stop
$  for P in 02 05 10 11 16 ; \
 do /bin/cp data/employees/titles#P#p01.MYI \
    data/employees/titles#P#p$P.MYI; \
done
$ ./start

Now there are 5 corrupted partitions. Checking them one by one would be tedious. One side effect of fixing bug#20129 is that now you can use CHECK TABLE on partitioned tables (the manual states otherwise today (September 15, 2008) but it should be fixed soon).

check table titles;
+------------------+-------+----------+-------------------------------------------------+
| Table            | Op    | Msg_type | Msg_text                                        |
+------------------+-------+----------+-------------------------------------------------+
| employees.titles | check | warning  | Size of datafile is: 478924       Should be: 0  |
| employees.titles | check | error    | Record-count is not ok; is 18293   Should be: 0 |
| employees.titles | check | warning  | Found 18293 key parts. Should be: 0             |
| employees.titles | check | error    | Partition p02 returned error                    |
| employees.titles | check | error    | Corrupt                                         |
+------------------+-------+----------+-------------------------------------------------+

Interestingly, CHECK TABLE reports only the first corruption. Let's fix it and try again.

alter table titles repair partition p02;
+------------------+--------+----------+----------------------------------------+
| Table            | Op     | Msg_type | Msg_text                               |
+------------------+--------+----------+----------------------------------------+
| employees.titles | repair | warning  | Number of rows changed from 0 to 18293 |
| employees.titles | repair | status   | OK                                     |
+------------------+--------+----------+----------------------------------------+
2 rows in set (0.07 sec)

mysql [localhost] {msandbox} (employees) > check table titles;
+------------------+-------+----------+-------------------------------------------------+
| Table            | Op    | Msg_type | Msg_text                                        |
+------------------+-------+----------+-------------------------------------------------+
| employees.titles | check | warning  | Size of datafile is: 526268       Should be: 0  |
| employees.titles | check | error    | Record-count is not ok; is 20107   Should be: 0 |
| employees.titles | check | warning  | Found 20107 key parts. Should be: 0             |
| employees.titles | check | error    | Partition p05 returned error                    |
| employees.titles | check | error    | Corrupt                                         |
+------------------+-------+----------+-------------------------------------------------+

Hmm. This could be a painfully long process if I have an unknown number of corrupted partitions and many partitions to check. Fortunately, the same side effect that made possible CHECK TABLE is also allowing REPAIR TABLE. Let's do it.

repair table titles;
+------------------+--------+----------+----------------------------------------+
| Table            | Op     | Msg_type | Msg_text                               |
+------------------+--------+----------+----------------------------------------+
| employees.titles | repair | warning  | Number of rows changed from 0 to 20107 |
| employees.titles | repair | warning  | Number of rows changed from 0 to 30784 |
| employees.titles | repair | warning  | Number of rows changed from 0 to 33367 |
| employees.titles | repair | warning  | Number of rows changed from 0 to 33345 |
| employees.titles | repair | status   | OK                                     |
+------------------+--------+----------+----------------------------------------+

The four remaining corrupted partitions were fixed with one single command. Again, use the test suite to make sure that your data is OK.
Thanks, MySQL developers. Good job!

Sunday, September 14, 2008

Launchpad improvements

Launchpad, the development framework created by Canonical, is under constant development.
If you have never used it, have a look at Jay's getting started and code management articles about it. If you know it already, you may be pleased to know that Launchpad has a countinuous edge beta testing program that anyone can try.

If you are getting curious, then let me tell you that there are actually three testing environments:
edge, which is the beta testing environment. You get into if by adding edge before launchpad.net in every URL. If you subscribe to the beta testing program and are accepted, every time you connect to launchpad you are redirected to the edge environment, which uses the latest beta interface and the production database. Every change you make with the edge interface is permanent.
staging, which is a non permanent testing environment. It uses the beta interface, and a copy of the production environment, which is replaced at regular intervals. Here you can try daring operations, without risking your valuable production data.
dogfood has rules similar to staging, but the level of experimental is even higher. Its contents may change without notice.
Using the edge environment, you will notice an enhanced interface (a detail here) and some hidden improvements.

One of the recent improvements is a bug that I asked the Canonical developers to fix, because it was affecting the usability of Launchpad as a development framework. Bug#66344 is about the graphical display of a blueprint dependency tree. In the old application, when you had more than a dozen dependencies, you got a cramped display like this one.

Absolutely unreadable. But if you get the same page through edge, you get this one:

The resolution of this bug is very important for my team campaign of promoting Launchpad. At MySQL developers meeting, which will start in Riga in the next days, we will show the goodies of Launchpad to anyone willing to listen, and since blueprint management are a vital component in every development effort, this recent improvement will make things look better for Launchpad beginners. Thanks to Joey Stanford and Christian Reis for speeding up matters with this bug!
Happy coding!

Friday, September 12, 2008

Decoding binlog entries with row-based replication

If you have tried using row based replication, you may have noticed two things: (1) it fixes many inconsistencies of statement-based replication, but (2) the binlog is unfit for humans. Inspecting it after a problem occurs won't provide any useful information.
The instructions look like line noise, and when you don't see the result you were expecting you wonder if that's the case.

For example, after executing this code:

create table t1 (id int, c char(10), d date);
insert into t1 values (1, 'abc', '2008-01-01');
insert into t1 values (2, 'def', '2008-08-19');
insert into t1 values (3, 'ghi', current_date());
select * from t1;
+------+------+------------+
| id   | c    | d          |
+------+------+------------+
|    1 | abc  | 2008-01-01 | 
|    2 | def  | 2008-08-19 | 
|    3 | ghi  | 2008-09-12 | 
+------+------+------------+

The binlog, as shown by mysqlbinlog, gives output like


# at 182
#080912 17:15:07 server id 1  end_log_pos 289   Query   thread_id=5     exec_time=0     error_code=0
SET TIMESTAMP=1221232507/*!*/;
create table t1 (id int, c char(10), d date)
/*!*/;
# at 289
#080912 17:15:07 server id 1  end_log_pos 357   Query   thread_id=5     exec_time=0     error_code=0
SET TIMESTAMP=1221232507/*!*/;
BEGIN
/*!*/;
# at 357
# at 402
#080912 17:15:07 server id 1  end_log_pos 402   Table_map: `test`.`t1` mapped to number 21
#080912 17:15:07 server id 1  end_log_pos 443   Write_rows: table id 21 flags: STMT_END_F

BINLOG '
e4fKSBMBAAAALQAAAJIBAAAAABUAAAAAAAAABHRlc3QAAnQxAAMD/goC/goH
e4fKSBcBAAAAKQAAALsBAAAQABUAAAAAAAEAA//4AQAAAANhYmMhsA8=
'/*!*/;

This is more difficult to read than ancient Etruscan. If you are a DBA, you curse and look for help.
But now it has changed.
Starting with MySQL 5.1.28 (available since September 13th), using the "--verbose" option, mysqlbinlog will add some human readable comments after the statement.


BINLOG '
e4fKSBMBAAAALQAAAJIBAAAAABUAAAAAAAAABHRlc3QAAnQxAAMD/goC/goH
e4fKSBcBAAAAKQAAALsBAAAQABUAAAAAAAEAA//4AQAAAANhYmMhsA8=
'/*!*/;
### INSERT INTO test.t1
### SET
###   @1=1
###   @2='abc'
###   @3='2008:01:01'
# at 443

The manual has more information on this topic.
It is a much needed addition. Our support department insisted on having this feature developed before GA, and with reason. Assisting customers who have replication problems without being able to decode the binlog output is quite a hard task.
Jan Kneschke, the author of MySQL Proxy, showed the way with this proof of concept, and now the feature is out. Good job, all the developers involved!

The employees test database in the official MySQL docs, and Drizzle ready!

The Employees test database , one of my pet projects, has been included in the official MySQL documentation. Thanks to M.C. Brown for the effort.

And since I mentioned this subject, there are two additions:

the schema now supports partitioning. Instead of using employees.sql, load the data via employees_partitioned.sql
the schema has been altered to be used with Drizzle. The test script works with Drizzle if you compile the MD5 plugin and load it at startup.

Wednesday, September 10, 2008

MySQL virtual columns

Wouldn't it be cool to enhance MySQL creation table syntax to do this?

CREATE TABLE t1 (
  my_date DATE,
  my_month VIRTUAL INT AS (DATE_FORMAT(d,'%Y%m'))
);

And later, when you insert '2008-08-23' in my_date, then automatically my_month will be set to 200808.
Yes, it would be cool, and it has been done. Andrey Zhakov, a so far unknown to me community contributor, has extended the server code to implement virtual columns (WL#411) and functional indexes (WL#1075)
Andrey has published the code on Launchpad and has published detailed documentation in MySQL Forge. (A Virtual Column Preview and the refernce manual).
I still know very little of Andrey, but I am going to find out more soon. For now, I know that his code works as advertised. You can create virtual columns that are calculated at run time, and optionally stored in the table. Compared to views, this approach is much faster, because it calculates only during insert and updates, not during selects. Compared to triggers, virtual columns do the same work of INSERT and UPDATE triggers combined. And not only that.
As you know, I am an inquisitive person, and I don't accept a technical claim blindly. I need to test it with my own data before believing it. And so I did.
I built the code from the bazaar repository, and I modified a table from my employees test database.
Take for example the salaries table, and suppose you want to enhance it to include the difference between from_date and to_date. We're going to do it in three different ways. In the simple way, we add an INTEGER column, and we run an UPDATE statement after inserting 2.4 million records.

mysql> CREATE TABLE salaries (
    emp_no      INT             NOT NULL,
    salary      INT             NOT NULL,
    from_date   DATE            NOT NULL,
    to_date     DATE            NOT NULL,
    ddiff       int,
    PRIMARY KEY (emp_no, from_date)
); 

$ time ../use simple < load_salaries.dump

real 0m30.884s
user 0m3.129s
sys 0m0.367s

$ time ../use simple -e "update salaries set ddiff=datediff(to_date,from_date)"

real 0m17.774s
user 0m0.005s
sys 0m0.008s

Cost of the whole operation (insert + update) was about 48 seconds.
The second way is using a trigger. The same table structure, with an insert trigger.

mysql> CREATE TRIGGER bi_salaries
before insert on salaries
for each row
    set new.ddiff=datediff(new.to_date,new.from_date);

$ time ../use with_trigger < load_salaries.dump

real 1m18.370s
user 0m3.134s
sys 0m0.372s

Uh oh! Quite a overhead, I would say. The trigger takes 30 seconds more than combining INSERT and UPDATE.
The third way is with the virtual column.

mysql> CREATE TABLE salaries (
    emp_no      INT             NOT NULL,
    salary      INT             NOT NULL,
    from_date   DATE            NOT NULL,
    to_date     DATE            NOT NULL,
    ddiff       virtual int as (datediff(to_date, from_date)) stored,
    PRIMARY KEY (emp_no, from_date)
); 

$ time ../use with_virtual < load_salaries.dump

real 0m31.807s
user 0m3.137s
sys 0m0.369s

Wow! I was expecting some improvement in performance, but this is really cool!
Almost three times faster than triggers!
We need to prove that the data is correct before rejoicing. So I ran this query for each table

mysql> select count(*), sum(ddiff), avg(ddiff) from salaries;
+----------+--------------+-------------+
| count(*) | sum(ddiff)   | avg(ddiff)  |
+----------+--------------+-------------+
|  2844047 | 702296791062 | 246935.7191 | 
+----------+--------------+-------------+

And the result was the same for all of them. Virtual columns rock!

Monday, September 08, 2008

How to get your proposal accepted to the MySQL Users Conference 2009

The call for papers for the MySQL Users Conference and Expo 2009 is open. Proposals are accepted until October 22, 2008.
This post will tell you how to get your proposal accepted.
First: READ the following posts. I mean it!

Baron Schwartz's advice on how to write a great proposal. If you follow these guidelines, you can't be wrong.
Colin's list of 10. If you still had doubts after reading Baron's post, this one will clear your mind.

Sorry is we sound harsh, but we have the responsibility for the quality of MySQL UC sessions, and the only way to ensure quality is to be picky in our choice. Here are a few more rules of the game for the next conference.

Make sure you know the subject you propose.: We may ask you to prove your claims. We have had a few surprises during the last conference, with a couple of sessions lasting 20 minutes because the speaker had exhausted the topics. We are going to be stricter in our acceptance. We will accept some proposals conditionally. If that happens, we will inform the authors that we need more material before the final acceptance. If the material is not provided, we replace the session with some other proposal.
Always assume that someone else has proposed the same topic.: The competition is fierce. We usually reject two proposals for each one we accept. If you want your proposal to be selected, don't be lazy. Write a comprehensible abstract and a good bio. Work hard on it. For each subject, there are probably two or three more people who have spent days polishing their proposal. We will be only too happy to choose the best one! If you put together a half baked proposal in 5 minutes, you won't get in. Period.
A good abstract is not too short.: If your abstract is just a few lines promising the wonders that everyone knows you can deliver, that won't do. You may be a well known big shot in the field, but we won't default to "accepted" if we recognize your name (see what Colin says about rock stars). If that's the case, prove your status by writing a superb proposal.
A good abstract is not too long.: If you have written an article on a given topic and you simply cut and paste 13,000 words in the abstract box, it will only show us that you can't summarize, and your session would be unbearably boring.
Don't even try to advertise your company in a proposal.: The review committee is full of open source enthusiasts and technology lovers. We smell a stealthy marketing message a mile away. If you want to boost your company business, don't propose a session, but look at the conference page for sponsorship opportunities.
Read the above posts by Baron and Colin again.: Seriously. If you want that pass for the conference and a podium to greatness, you need to work hard. How hard? well, consider that, according to Damian Conway, you need to prepare from 10 to 50 hours for each hour of presentation. Some of this work must show in your abstract. If you read the above posts and follow the advice you are given, some of the hard work will show for sure.

All done? Sure? Have you read all the above twice? Good. Then, SUBMIT A PROPOSAL!.

Tuesday, September 02, 2008

Thanks GNU!

I met GNU for the first time fifteen years ago. I was working as a consultant in support of a criminal investigation, and as part of my duties I had to analyze a database hosted on a SCO server.
As often happens, the database was proprietary, and it did not include any facility to analyze data. I needed to build an application to explore the data thoroughly.
Most of my working libraries were written in C, which I used in other operating systems. So I contacted SCO and asked to buy a C compiler. I was told that it would cost me quite a lot (I was prepared for that, although not for the price they told me, but since I was going to expense it, I would not care), and that it would take one month to get the software. I did not have one month at my disposal. I needed to nail down the evidence for the investigation immediately. So I started asking around. I was doing some side work for my town university, and I mentioned the difficulty of getting the right software.
Then, a friend said: "You may use Linux." And this was the first time I heard the name. And I also heard about the GNU C compiler that looked like the right tool for the job.
So, instead of buying a software package, I bought a tape streamer, and transferred all the SCO server contents into tapes. Then I had fun installing my first Linux server (version 0.99c) and loaded all the SCO material into a free disk.
In less than two days, using gcc, I had in place all the analysis tools that I needed, and soon I delivered all the goods that my employer wanted.
That was one of the most satisfactory jobs in my career, and it had the side effect of leaving me with a server full of free software that did exactly what I wanted.
Since then, my relationship with GNU and Linux has evolved continuously, and by now I can say that GNU software has contributed largely to my professional growth, accomplishment, and satisfaction, from my consulting days to my employment with MySQL and Sun.
GNU is older than fifteen. It is in fact celebrating today its 25th birthday. It has been a bumpy ride. GNU has got much support and much debate. It is almost ubiquitous, but it is not as respected and recognized as it should be.
For my side, I am happy with all the GNU project has contributed to the evolution of the IT industry. Thanks, GNU. My professional life would be much different without free software!

Monday, September 01, 2008

Introducing the MySQL community-driven Replication Monitoring Tools

If you are using MySQL replication, you know how hard is to monitor it properly.
You have a wide choice of commercial and free tools, all of which check the health of your replication system from the outside.
A few years ago, I wrote an article advocating a self-monitoring and self-healing replication system, using new features in MySQL 5.1. At the time, there were some missing technology pieces to make this project feasible. Now the pieces exist, and you can create your own self monitoring replication system.

Hartmut rules!

It started during FrOSCon, when I discussed with Hartmut Holzgraefe the practical application of a plugin that he wrote some time ago. One of the missing pieces for the internal replication monitoring is the ability of reading replication values into variables. Currently, you can see the replication status with SHOW MASTER STATUS and SHOW SLAVE STATUS. Unfortunately, you can't manipulate the values from these commands inside a stored routines. And thus there is no way of measuring the health of a replication system without the help of external programming languages.
This means that, even if you manage to detect a master failure, you can't create a CHANGE MASTER STATUS with the right parameters.
During our FrOSCon talks, Hartmut released a small Launchpad project, the MySQL replication status INFORMATION_SCHEMA plugin, which implements two INFORMATION_SCHEMA tables for MASTER and SLAVE STATUS. Using these extensions, you can get single values inside a stored routine.

The Replication Monitor is born

From this starting point, the Replication Monitoring Tools become possible.
The project is young, and it only contains a proof-of-concept monitoring tool (see below), but given enough time and help, it can be expanded into a full fledged system.
The roadmap has a long list of possible features, some of which are quite ambitious

The project is open. I need ideas, manpower, testers, to implement all the intended features. This is a project from the community to the community.
The principle that you have seen several times when downloading MySQL server fully applies. If you have time, you can save money and help yourself and the community at the same time.

Replication Monitor in action

The proof of concept implements a table on the master, where the slaves write their status, by means of federated tables.
Each slave uses a federated table to the master INFORMATION_SCHEMA.MASTER_STATUS to compare its status with the master,a nd a second federated table to write its status to the master table. The result is that you can read the global status of a replication system in the master, and in each slave (because the table is replicated).

master> select * from slaves_status ;
+-----------+------------------+------------+----------------+
| server_id | binlog_file      | binlog_pos | slave_status   |
+-----------+------------------+------------+----------------+
|       101 | mysql-bin.000001 |      82321 | IO+,SQL+,P+,E+ |
|       102 | mysql-bin.000001 |      82530 | IO+,SQL+,P+,E+ |
|       103 | mysql-bin.000001 |      82739 | IO+,SQL+,P+,E+ |
|       104 | mysql-bin.000001 |      82948 | IO+,SQL+,P+,E+ |
+-----------+------------------+------------+----------------+

The status reads as follows:
IO+ means that the slave IO_thread is working (the opposite is io-.
SQL+ means that the SQL thread is working.
P+ means that the slave is reading from the latest binlog and that the position is equal or greater to the one shown by the master.
E+ means that the slave has executed all the statements received from the master.
Let's try a small experiment. We stop slave 2 and see what happens.

slave2> stop slave;

master> select * from slaves_status ;
+-----------+------------------+------------+----------------+
| server_id | binlog_file      | binlog_pos | slave_status   |
+-----------+------------------+------------+----------------+
|       101 | mysql-bin.000001 |      91517 | IO+,SQL+,P+,E+ |
|       102 | mysql-bin.000001 |      91517 | io-,sql-,p-    |
|       103 | mysql-bin.000001 |      91932 | IO+,SQL+,P+,E+ |
|       104 | mysql-bin.000001 |      92141 | IO+,SQL+,P+,E+ |
+-----------+------------------+------------+----------------+

The monitor shows that slave 2 is not working. If we restart the slave, the situation is restored.

slave2> start slave;

select * from slaves_status ;
+-----------+------------------+------------+----------------+
| server_id | binlog_file      | binlog_pos | slave_status   |
+-----------+------------------+------------+----------------+
|       101 | mysql-bin.000001 |     114894 | IO+,SQL+,P+,E+ |
|       102 | mysql-bin.000001 |     115104 | IO+,SQL+,P+,E+ |
|       103 | mysql-bin.000001 |     115314 | IO+,SQL+,P+,E+ |
|       104 | mysql-bin.000001 |     115524 | IO+,SQL+,P+,E+ |
+-----------+------------------+------------+----------------+

We can do the same experiment while loading some heavy data, like the employee sample database.

master> select * from slaves_status ;
+-----------+------------------+------------+----------------+
| server_id | binlog_file      | binlog_pos | slave_status   |
+-----------+------------------+------------+----------------+
|       101 | mysql-bin.000001 |     134158 | IO+,SQL+,P+,E+ |
|       102 | mysql-bin.000001 |     134975 | IO+,SQL+,p-    |
|       103 | mysql-bin.000001 |     133777 | IO+,SQL+,P+,E+ |
|       104 | mysql-bin.000001 |    1165155 | IO+,SQL+,P+,e- |
+-----------+------------------+------------+----------------+

master> select * from slaves_status ;
+-----------+------------------+------------+----------------+
| server_id | binlog_file      | binlog_pos | slave_status   |
+-----------+------------------+------------+----------------+
|       101 | mysql-bin.000001 |    7343962 | IO+,SQL+,P+,e- |
|       102 | mysql-bin.000001 |    8374099 | IO+,SQL+,P+,e- |
|       103 | mysql-bin.000001 |    7343751 | IO+,SQL+,P+,e- |
|       104 | mysql-bin.000001 |    8374310 | IO+,SQL+,P+,e- |
+-----------+------------------+------------+----------------+

Very rapidly (the experiment happens in a MySQL Sandbox, so all servers use the same disk and CPU), the slaves are left behind in execution.
If we stop a slave while loading, the situation is even more explicit.

slave3> stop slave;

master> select * from slaves_status ;
+-----------+------------------+------------+----------------+
| server_id | binlog_file      | binlog_pos | slave_status   |
+-----------+------------------+------------+----------------+
|       101 | mysql-bin.000001 |   45764491 | IO+,SQL+,P+,e- |
|       102 | mysql-bin.000001 |   45764703 | IO+,SQL+,P+,e- |
|       103 | mysql-bin.000001 |   42685103 | io-,sql-,p-    |
|       104 | mysql-bin.000001 |   42685103 | IO+,SQL+,P+,e- |
+-----------+------------------+------------+----------------+

Let's introduce a new element of difference, and flush logs while still loading data.

master> flush logs;

master> select * from slaves_status ;
+-----------+------------------+------------+----------------+
| server_id | binlog_file      | binlog_pos | slave_status   |
+-----------+------------------+------------+----------------+
|       101 | mysql-bin.000002 |    2044673 | IO+,SQL+,P+,e- |
|       102 | mysql-bin.000002 |    3066965 | IO+,SQL+,P+,e- |
|       103 | mysql-bin.000001 |   42685103 | io-,sql-,p-    |
|       104 | mysql-bin.000002 |    3067176 | IO+,SQL+,P+,e- |
+-----------+------------------+------------+----------------+

Now the stopped slave is really far behind. Let's see what happens when we put it back online.

slave3> start slave;

master> select * from slaves_status ;
+-----------+------------------+------------+----------------+
| server_id | binlog_file      | binlog_pos | slave_status   |
+-----------+------------------+------------+----------------+
|       101 | mysql-bin.000002 |   27604369 | IO+,SQL+,P+,e- |
|       102 | mysql-bin.000002 |   27603945 | IO+,SQL+,P+,e- |
|       103 | mysql-bin.000002 |   27604157 | IO+,SQL+,P+,E+ |
|       104 | mysql-bin.000002 |   25558385 | IO+,SQL+,P+,e- |
+-----------+------------------+------------+----------------+

The restored slave was the first one to catch up in execution. This would not probably happen in a situation with separate hosts for each slave, but it's interesting to test our system.
And finally, after a few seconds more, all slaves have caught up with the master, with both data fetching and execution.

master> select * from slaves_status ;
+-----------+------------------+------------+----------------+
| server_id | binlog_file      | binlog_pos | slave_status   |
+-----------+------------------+------------+----------------+
|       101 | mysql-bin.000002 |  104641288 | IO+,SQL+,P+,E+ |
|       102 | mysql-bin.000002 |  104641501 | IO+,SQL+,P+,E+ |
|       103 | mysql-bin.000002 |  104641714 | IO+,SQL+,P+,E+ |
|       104 | mysql-bin.000002 |  104641927 | IO+,SQL+,P+,E+ |
+-----------+------------------+------------+----------------+

Getting started

There is a step-by-step set of instructions in MySQL Forge Wiki that explain how to build and install the necessary plugin.
After that, you should get the code from the bazaar repository and make the changes in the option files of master and slaves (check the sql/master_scripts and sql/slave_scripts directories). Then, you should load the initializing script in the master and in each slave, and you are in business.
Notice that this first version relies on a MySQL Sandbox running with MySQL 5.1.28, compiled from source. Further versions will lift this limitation, but for now, if you want to try it out, you need to follow my steps closely.

Next steps

What's left? Ah, yes. A totally internal monitoring system is not much helpful. If the DBA needs to run a query to know what's happening, then the system is almost worthless.
This is another challenge, because MySQL architecture forbids connection to operating system services such as mail applications. There is already a solution to this problem. We only need to integrate it with the rest of the replication monitoring system. More challenges are outlined in the roadmap.
If you think this is interesting, check the roadmap, and comment on it. Even better, if you want to implement parts of it, join the sandbox developers group and start coding!

The Data Charmer