Friday, September 26, 2008

Drizzling MySQL


Sakila Drizzled

Have you ever used subqueries with MySQL? It's an addition introduced in version 4.1, and since then neglected. The performance of subqueries in MySQL 4.1 and 5.x is really a sad story.
For example, using the Employees test database, you may try this query:
select
title, from_date, to_date
from titles
where emp_no in
(select emp_no
from employees
where first_name = 'Mary'
and last_name = 'Sluis'
)

The result, in MySQL 5.1.28 is
+-------+------------+------------+
| title | from_date | to_date |
+-------+------------+------------+
| Staff | 1990-01-22 | 1996-11-09 |
+-------+------------+------------+
1 row in set (7.88 sec)
Ugly, uh?
But let's try the same query in MySQL 6.0.8.
select title, from_date, to_date
from titles
where emp_no in (
select emp_no
from employees
where first_name = 'Mary'
and last_name = 'Sluis');
+-------+------------+------------+
| title | from_date | to_date |
+-------+------------+------------+
| Staff | 1990-01-22 | 1996-11-09 |
+-------+------------+------------+
1 row in set (0.13 sec)
That's a completely different story! What's happening?
The Optimizer Team has been hard at work, and in version 6.0 we will see a huge improvement. Let's try to explain the result:
explain extended select title, from_date, to_date
from titles
where emp_no in (
select emp_no from employees
where first_name = 'Mary'
and last_name = 'Sluis')\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: employees
type: ALL
possible_keys: PRIMARY
key: NULL
key_len: NULL
ref: NULL
rows: 300024
filtered: 100.00
Extra: Using where
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: titles
type: ref
possible_keys: PRIMARY,emp_no
key: emp_no
key_len: 4
ref: employees.employees.emp_no
rows: 1
filtered: 100.00
Extra:
2 rows in set, 1 warning (0.00 sec)

show warnings\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select `employees`.`titles`.`title` AS
`title`,`employees`.`titles`.`from_date` AS `from_date`,
`employees`.`titles`.`to_date` AS `to_date`
from `employees`.`employees`
join `employees`.`titles`
where ((`employees`.`titles`.`emp_no` = `employees`.`employees`.`emp_no`)
and (`employees`.`employees`.`last_name` = 'Sluis')
and (`employees`.`employees`.`first_name` = 'Mary'))
The optimizer has converted the subquery into a join, with a query plan that is 20 times better than the one used in MySQL 5.1. Try the explain extended command in MySQL 5.1.28, and the resulting query is
select `employees`.`titles`.`title` AS `title`,
`employees`.`titles`.`from_date` AS `from_date`,
`employees`.`titles`.`to_date` AS `to_date`
from `employees`.`titles`
where (`employees`.`titles`.`emp_no`,
(((`employees`.`titles`.`emp_no`)
in employees on PRIMARY
where ((`employees`.`employees`.`last_name` = 'Sluis')
and (`employees`.`employees`.`first_name` = 'Mary')
and ((`employees`.`titles`.`emp_no`) = `employees`.`employees`.`emp_no`)))))
So far, the comparison shows that some subqueries in 6.0 are evaluated much better than in previous versions. But what's the "drizzled" part from the title?
There are two issues, actually.
First, as you may have heard, Drizzle is a lightweight version of MySQL, based on version 6.0 code. If you import the same data and run the previous query in Drizzle, you get the same good performance that I got in 6.0.
Second, it would be nice to have subquery optimization in production. Unfortunately, neither Drizzle nor MySQL 6.0 are GA. They are both alpha. Drizzle has not reached the feature freeze stage yet, and MySQL 6.0 should wait for Falcon, Maria, and the online backup to be ready before being considered stable. So today I made a little experiment. I drizzled MySQL 6.0, by disabling Maria and Falcon, getting binaries that are less than half in size, and skipping a lot of unfixed bugs. If you are interested, I compiled using this script.
#! /bin/sh

path=`dirname $0`
. "$path/SETUP.sh"

extra_flags=" $fast_cflags $max_cflags -g"
extra_configs=" --with-plugins=innobase,archive,blackhole,csv,partition "
extra_configs=" $extra_configs --without-falcon "
extra_configs=" $extra_configs --without-maria-storage-engine "

. "$path/FINISH.sh"
You may ask, what's the usefulness of this stripping down operation?
It's just a small test. I have a private agenda, and I will do something even more dramatic in the coming weeks. In the meantime, please think about this. There is a great feature in 6.0, which could be very nice to use in production. What would you do?
Backport it to MySQL 5.1? Strip down 6.0 leaving only subquery optimization, à la drizzle? I have some ideas, but I would like to hear from you. What would you do?

10 comments:

Mark Callaghan said...

Why must 6.0 wait for a storage engine? Storage engines are pluggable and can be released when they are ready.

I drizzle MySQL all the time. Users might not like it but it makes the server very stable. I disabled subqueries, views, triggers, stored procedures, explicit XA transactions, LOAD DATA and create temp table on a master, delayed insert statements and the query cache.

Arjen Lentz said...

Thank goodness, finally. For people used to writing subqueries, this will be a godsend (I'm sure Igor & co appreciate ;-)

So how is it with correlated subqueries and other constructs that can't be rewritten into joins? Have they received some optimiser-attention also?

Giuseppe Maxia said...

Mark,
Falcon and Maria are not pluggable engines.
Not like PBXT. They are both integrated in the server.
Giuseppe

Ronald Bradford said...

Hi Giuseppe,

I was asked just today regarding the performance of sub-queries in MySQL.

My feedback, and now your input with detailed timings, perfectly confirmed my comments.

Regards

Ronald

Anonymous said...

Why arent they pluggable? I guess for Maria it makes sense in the way that Maria is supposed to replace MyISAM for internal tables. But Falcon? It should be using the same infrastructure the other engines are using? Or did you mean to say that the common infrastructure that Falcon is using is not yet stable?

Baron said...

Ditto, I thought Falcon was going to be pluggable. If it's not, then 6.0 has to wait for Falcon to be STABLE, not just "ready", before it can be a decent server -- that's tweaking too many knobs at once. It takes years for something like a new storage engine to really be stable and proven. The server and Falcon should not be tightly coupled.

Anonymous said...

Just for an informal comparison, here are results of the same query on Kickfire using MySQL 5.1:

mysql> select SQL_NO_CACHE
-> title, from_date, to_date
-> from titles
-> where emp_no in
-> (select emp_no
-> from employees
-> where first_name = 'Mary'
-> and last_name = 'Sluis'
-> );
+-------+------------+------------+
| title | from_date | to_date |
+-------+------------+------------+
| Staff | 1990-01-22 | 1996-11-09 |
+-------+------------+------------+
1 row in set (1.24 sec)

mysql> select SQL_NO_CACHE title, from_date, to_date from titles where emp_no in (select emp_no from employees where first_name = 'Mary' and last_name = 'Sluis' );
+-------+------------+------------+
| title | from_date | to_date |
+-------+------------+------------+
| Staff | 1990-01-22 | 1996-11-09 |
+-------+------------+------------+
1 row in set (0.36 sec)

Because we can externally optimize and execute queries, our subquery performance is much better than other storage engines on MySQL 5.1

Anonymous said...

And note, I was using a debug build :)

Jim said...

All storage engines must call back into the server and access internal server data structures to operate. In this sense, no storage engine is actually pluggable. Each must be compiled specifically for the each server (meaning exact build).

That said, Falcon is as pluggable as a storage engine can be. And no, it is not integrated into the server.

The bottom line, however, is that 6.0 is defined as 5.1 plus Falcon, backup, and whatever part of Maria is ready. Shipping 6.0 without Falcon, backup, and Maria is like changing the packaging on 5.1 -- a marketing only release.

Giuseppe Maxia said...

Jim,
Can Falcon be released as a plugin, and be loaded at run time in 5.1 or 6.0, the way SolidDB or PBXT can?

If not, then, it is not pluggable the way I intend it to.

Giuseppe