The Data Charmer: 2008

Thursday, December 18, 2008

Tutorial on Partitioning at the MySQL Users Conference 2009

Another approval has arrived. I will be co-presenting a tutorial at the MySQL Users Conference 2009.
Partitioning in MySQL 5.1. My co-speaker is Sarah Sproehnle, a well known MySQL trainer. The topic is one of the flagship features of MySQL 5.1.

The tutorial will cover the basics, several practical scenarios, tips and tricks, and some sneak peeks of the latest enhancements being developed.

Wednesday, December 17, 2008

Backing up MySQL partitioned tables is nothing special

In case you were wondering, after seeing this self-defined tutorial, backing up MySQL 5.1 partitioned tables does not require any particular additional technology.
This tutorial presents MySQL 5.1 in a short paragraph, and then starts talking about a commercial backup application, as if it were the only way of backing up partitioned tables.
The title, for sure, is misleading. Backing Up MySQL Partitioned Tables. Reading such a title, readers expect either a list of possible solutions, or being presented with the only existing way of dealing with this task. Instead, the only instructions that you get is how to use the above mentioned tool, which is definitely not the only way of getting the job done. It is also doubly misleading because, since partitioning is a new technology, the casual reader may think "oh? What's peculiar with partitioned tables? do I need to take extra care to back them up?. Instead, there's nothing special about partitioned tables. Depending on the backup method, handling of partitioned tables is completely transparent for the user.
I don't doubt that this tool can do the job, but there are native tools with MySQL that can accomplish the task just as well.
There are several ways of backing up a partitioned table. But the first one that springs to mind is mysqldump.
The backup is as simple as:

mysqldump --opt db_name table_name > file.dump

The restore is equally simple:

mysql db_name < file.dump

Now, to be fair, I know one case where the backup of partitioned tables is problematic, but the one presented in the FUD tutorial was not it. The only possible trouble happens when you define partitions with different data directories and want to restore the table to a server that has a different directory structure. I had this problem two years ago, and I solved it by splitting the dump to keep the DDL and the DML in separate scripts. I wonder if there is a method that can detect and fix this problem on the spot.

Update: Here's the original article I am talking about.

howto_forge article about partition backup

Wednesday, December 10, 2008

Guest Post: Philip Stoev. If you love it break it. Getting started with the RQG

I am glad to host in these pages a post by Philip Stoev, a remarkable QA engineer, creative, resourceful, and a notorious troublemaker.
I met Philip by email in 2007, when I was exploring his Perl modules, which I used for one of my most rewarding articles. A few months later, when we met in person during the MySQL Developers Meeting, Philip was hired as a QA engineers, with my warmest recommendations.
This post is about the Random Query Generator, which is also the topic of a MySQL University session. Thanks, Philip!

Want to test and break MySQL without having to cook up your own test framework?

Sun Microsystems have open-sourced the Random Query Generator, a new tool for generating tables with random data and then executing a random query workload against them. Both the data and the queries are fully configurable.

For example, assume that you want to evaluate how well partitioning will work for your workload:

Step1. Describe your tables.

Create a configuration file named partitioning.zz, instructing the tool to create a partitioned table containing a few varchar() fields. The format of the file is described in the Forge wiki. The configuration file will contain the following.


$tables = {
     rows => [65535],
  pk => [ 'integer not null auto_increment' ],
     partitions => [ 'PARTITION BY KEY (pk) PARTITIONS 2' ],
};

$fields = {
     types => [ 'varchar', 'char' ],
     indexes => [undef ]
};

$data = {
  strings => [ 'english' ]
}

This basically says "create a table with 64K rows, two char fields, 2 partitions, auto_increment primary key and fill it with English words."

Step2. Describe your workload.

The configuration file describing the workload takes the form of an SQL grammar. Its format is described in the Forge wiki. For example, let's check how updates on a primary key are doing. Let's name the configuration partitioning.yy and put the following in:

query:
  UPDATE _table SET _field_no_pk = _english WHERE `pk` = _smallint_unsigned;

This says "UPDATE a random non-PK field from the table to a random english word for a random value of the PK".

This is just the very tip of the iceberg. Arbitrarily complex SQL queries can be expressed with the appropriate SQL grammar.

Step3. Run the test!

Fetch the Random Query Generator from Launchpad:

bzr branch lp:~randgen/randgen/main

and then run the test:

perl runall.pl \
--basedir=/path/to/mysql-distribution \
--gendata=partitioning.zz \
--grammar=partitioning.yy \
--queries=100000 \
--threads=10 \
--duraton=360

This will start a server by using the specified MySQL location and will then run random queries in 10 threads, executing up to 100000 queries per thread for no more than 360 seconds.

While the test is running, monitor the performance of your server (which will run on port 19306). Check the CPU usage and use SHOW STATUS and mysqladmin status to see how things are going. To see performance without partitioning, edit the partitioning.zz file and remove the "partitions" line and run the test again.

You are also most welcome to join the MySQL University session on the RQG, to be held tomorrow, December 11 at 14:00 UTC / 8am CST (Central) / 9am EST (Eastern) / 14:00 BST / 15:00 CET / 17:00 MDT (Moscow).

More information is available here:

Tuesday, December 09, 2008

Speaking about the Event Scheduler at the Users Conference 2009

The approvals are coming. I will be speaking at the MySQL Users Conference 2009.
Using the event scheduler. The friendly behind-the-scenes helper. My co-speaker will be none other than Andrey Hristov, the designer and implementer of the Event scheduler, one of the main features of MySQL 5.1.

Monday, December 01, 2008

The partition helper - Improving usability with MySQL 5.1 partitioning

I talked several times about partitioning usability. In my many tests of partitioning I found myself in need of generating list of partitions for a given range.
I made the Partition Helper (see code at the end of this post) to scratch this particular itch, i.e. making partitions reasonably quickly and without thinking too much.

The Partition Helper is a Perl script that converts some simple options into lots of partitions.
The detailed reference manual lists all the options and many examples.
Here, I would like to show just an example.
Say, you want to create partitions in table blog_posts by monthly intervals on column published. Given that you have data for the last two years, you use the partition helper like this:

./partitions_helper --table=blog_posts \
--column=published \
--interval=month \
--start=2007-01-01 \
--end=2009-01-01
ALTER TABLE blog_posts
PARTITION by range (to_date(published))
(
partition p001 VALUES LESS THAN (to_days('2007-01-01'))
, partition p002 VALUES LESS THAN (to_days('2007-02-01'))
, partition p003 VALUES LESS THAN (to_days('2007-03-01'))
, partition p004 VALUES LESS THAN (to_days('2007-04-01'))
, partition p005 VALUES LESS THAN (to_days('2007-05-01'))
, partition p006 VALUES LESS THAN (to_days('2007-06-01'))
, partition p007 VALUES LESS THAN (to_days('2007-07-01'))
, partition p008 VALUES LESS THAN (to_days('2007-08-01'))
, partition p009 VALUES LESS THAN (to_days('2007-09-01'))
, partition p010 VALUES LESS THAN (to_days('2007-10-01'))
, partition p011 VALUES LESS THAN (to_days('2007-11-01'))
, partition p012 VALUES LESS THAN (to_days('2007-12-01'))
, partition p013 VALUES LESS THAN (to_days('2008-01-01'))
, partition p014 VALUES LESS THAN (to_days('2008-02-01'))
, partition p015 VALUES LESS THAN (to_days('2008-03-01'))
, partition p016 VALUES LESS THAN (to_days('2008-04-01'))
, partition p017 VALUES LESS THAN (to_days('2008-05-01'))
, partition p018 VALUES LESS THAN (to_days('2008-06-01'))
, partition p019 VALUES LESS THAN (to_days('2008-07-01'))
, partition p020 VALUES LESS THAN (to_days('2008-08-01'))
, partition p021 VALUES LESS THAN (to_days('2008-09-01'))
, partition p022 VALUES LESS THAN (to_days('2008-10-01'))
, partition p023 VALUES LESS THAN (to_days('2008-11-01'))
, partition p024 VALUES LESS THAN (to_days('2008-12-01'))
, partition p025 VALUES LESS THAN (to_days('2009-01-01'))
);

The above command can be also expressed as

./partitions_helper --table=blog_posts --column=published --interval=month \
--start=2007-01-01 --partitions=25

Or you can shorten it, if you remember the small options:

./partitions_helper -t blog_posts -c published -i month -s 2007-01-01 -p 25

More options are available. You can see them using

./partitions_helper --help

Enjoy!

Update Here's the code to the Partition helper, published here after MySQL Forge was taken down.

#!/usr/bin/perl
#    The MySQL Partitions helper
#    Copyright (C) 2008, 2009 Giuseppe Maxia
#
#    This program is free software; you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation; version 2 of the License
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program; if not, write to the Free Software
#    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

#
# This program creates a ALTER TABLE statement to add or reorganize 
# date based partitions for MySQL 5.1 or later
#

use strict;
use warnings;
# use diagnostics;
use English qw( ‐no_match_vars ) ;
use Getopt::Long qw(:config no_ignore_case );
use Data::Dumper;

my $VERSION = '1.0.4';

#
# Parse options are fully qualified options with descriptive help,
# parse string for the command line, and sort order for the help
#
my %parse_options = (
    table           =>  {
                            value   => '',
                            parse   => 't|table=s',
                            help    => [
                                        'The table being partitioned',
                                        '(no default)'
                                       ],
                            so      =>  20,
                        },
    column          =>  {
                            value   => '',
                            parse   => 'c|column=s',
                            help    => [
                                        'The partitioning column',
                                        '(no default)',
                                       ],
                            so      =>  30,
                        },
    interval        =>  {
                            value   => 'month',
                            parse   => 'i|interval=s',
                            help    => [
                                        'The interval between partitions',
                                        'Accepted: "year", "month", "week", "day", or a number',
                                        '(default: month) ',
                                       ],
                            so      =>  40,
                        },
    partitions      =>  {
                            value   => 0,
                            parse   => 'p|partitions=i',
                            help    => [
                                        'How many partitions to create',
                                        '(default: 0) ',
                                       ],
                            so      =>  50,
                        },
    first_partition =>  {
                            value   => 1,
                            parse   => 'first_partition=i',
                            help    => [
                                        'the first partition to create',
                                        '(default: 1) ',
                                       ],
                            so      =>  60,
                        },
    reorganize      =>  {
                            value   => '',
                            parse   => 'reorganize=s',
                            help    => [
                                        'the partition(s) to reorganize',
                                        '(default: none) '
                                       ],
                            so      =>  70,
                        },
    
    start           =>  {
                            value   => '2001-01-01',
                            parse   => 's|start=s',
                            help    => [
                                        'the minimum partitioning value',
                                        '(default: 1 for numbers, 2001-01-01 for dates) '
                                       ],
                            so      =>  80,
                        },
    end             =>  {
                            value   => '',
                            parse   => 'e|end=s',
                            help    => [
                                        'The maximum partitioning value',
                                        'Used unless --partitions is used',
                                        '(no default) ',
                                       ],
                            so      =>  90,
                        },
    function             =>  {
                            value   => '',
                            parse   => 'f|function=s',
                            help    => [
                                        'The partitioning function to use in the "range" declaration',
                                        '(default: to_days, unless --list is used) ',
                                       ],
                            so      =>  100,
                        },
    list             =>  {
                            value   => 0,
                            parse   => 'l|list',
                            help    => [
                                        'Use the COLUMNS feature (versions >= 5.5)',
                                        '(default: no) ',
                                       ],
                            so      =>  110,
                        },
     maxvalue       =>  {
                            value   => 0,
                            parse   => 'x|maxvalue',
                            help    => [
                                        'Adds MAXVALUE as last partition',
                                        '(default: disabled) ',
                                       ],
                            so      =>  115,
                        },
    prefix           =>  {
                            value   => 'p',
                            parse   => 'prefix=s',
                            help    => [
                                        'prefix for partition name',
                                        '(default: p) ',
                                       ],
                            so      =>  120,
                        },
    explain          =>  {
                            value   => 0,
                            parse   => 'explain',
                            help    => [
                                        'show the current option values',
                                        '(default: no) ',
                                       ],
                            so      =>  130,
                        },

    version             =>  {
                            value   => 0,
                            parse   => 'version',
                            help    => [
                                        'display the version',
                                        '(default: none) ',
                                       ],
                            so      =>  400,
                        },
    help             =>  {
                            value   => 0,
                            parse   => 'h|help',
                            help    => [
                                        'display the help page',
                                        '(default: no) ',
                                       ],
                            so      =>  500,
                        },
);

# 
# convert parse options to simple options
#
my %options = map { $_ ,  $parse_options{$_}{'value'}}  keys %parse_options;

# 
# get the options from the command line
#
GetOptions (
    map { $parse_options{$_}{parse}, \$options{$_} }        
        grep { $parse_options{$_}{parse}}  keys %parse_options 
) or get_help();

get_help() if $options{help};

if ($options{version}) {
    print credits();
    exit 0;
}


# print Dumper(\%options) ; exit;

my %valid_intervals = (
    day   => 1,
    week  => 1,
    month => 1,
    year  => 1,
);

#
# start and end dates
#
my ($syear, $smonth, $sday) = ();
my ($eyear, $emonth, $eday) = ();

#
# deals with placeholder features
#

for my $op ( qw(operation) ) {
    if ($options{$op}) {
        die "option <$op> is not implemented yet\n";
    }
}

# 
# check that a table and column are given
#
unless ($options{table}) {
    die "table name required\n";
}

unless ($options{column} or $options{reorganize} ) {
    die "column name required\n";
}

#
# accept only one of --end or --partitions
#
if ($options{end} && $options{partitions}) {
    die "too many quantifiers. Use EITHER '--partitions' OR '--end' \n";
}

#
# check that we parsed a valid interval
#
if ( $options{interval} =~ /^\d+$/) {
    unless ($options{start} =~ /^\d+$/) {
        $options{start} = 1;
    }
    if ($options{end}) {
        unless ($options{end} =~ /^\d+$/) {
            die "the end value must be a number\n";
        } 
        if ($options{end} < $options{interval}) {
            die "the end value must be bigger than the interval\n";
        }
        if ($options{end} <= $options{start}) {
            die "the end value must be bigger than the start\n";
        }
        $options{partitions} = int( ($options{end} +1 - $options{start}) / $options{interval});
    }
}
else {
    unless (exists $valid_intervals{ $options{interval} } ) {
        die "invalid interval specified: $options{interval}\n";
    }
    #
    # for year,  month, or week the function must be to_days
    #
    unless ($options{list}) {
        $options{function} = 'to_days' unless $options{function};
    }

    #
    # check the start date
    #
    if ( $options{start} =~ /(\d{4})[\-\.\/](\d+)[\-\.\/](\d+)/) {
        ($syear, $smonth, $sday) = ($1, $2, $3);
        $smonth +=0;
        $sday +=0;
        #print "start $syear $smonth $sday\n";
    }
    else {
        die "invalid date $options{start}\n";
    }
    #
    # check the end date
    #
    if ($options{end}) {
        if ( $options{end} =~ /(\d{4})[\-\.\/](\d+)[\-\.\/](\d+)/) {
            ($eyear, $emonth, $eday) = ($1, $2, $3);
            $emonth +=0;
            $eday +=0;
            # print "end $eyear $emonth $eday\n";
        }
        else {
            die "invalid date $options{end}\n";
        }
        if ($options{interval} eq 'year') {
            $options{partitions} = $eyear - $syear +1;
        }
        elsif ($options{interval} eq 'month') {
            my $months =   (12 - $smonth) 
                 + ( ($eyear - $syear -1) * 12 )
                 + $emonth + 1;
            # print $months,$/;
            $options{partitions} = $months;
        }
        elsif ($options{interval} eq 'week') {
            $options{partitions} = weeks_between($syear, $smonth, $sday,
                                                 $eyear, $emonth, $eday);
        }
        elsif ($options{interval} eq 'day') {
            $options{partitions} = days_between($syear, $smonth, $sday,
                                                 $eyear, $emonth, $eday);
        }
    }       
}

#
# there must be a valid number of partitions
#

unless ($options{partitions} && ($options{partitions} =~ /^\d+$/) ) {
    die "number of partitions required. Use EITHER '--partitions' OR '--end'\n";
}

if ($options{partitions} > 1024) {
    die "too many partitions ($options{partitions}). Maximum allowed is 1024\n";
} 
else {
    print "# partitions: $options{partitions}\n";
}

#
# don't accept a function if COLUMS is being used
#
if ( $options{function} && $options{list} ) {
    die "you must specify either list or function\n";
}

if ($options{explain}) {
    show_options();
}

# print Dumper(\%options) ; exit;

# -----------------------------------------
# start building the ALTER TABLE statement
# -----------------------------------------

print "ALTER TABLE $options{table} \n";
if ($options{reorganize} ) {
    print "REORGANIZE PARTITIONS $options{reorganize} INTO \n";
    $options{prefix} = 'pr';
}
else {
    print "PARTITION by range " ;

    if ($options{function}) {
        print "($options{function}(";
    }
    elsif ($options{list}) {
        print "columns(" 
    }
    else {
        print "("
    }

    print "$options{column}";

    if ($options{function}) {
        print ")";
    }

    print ")\n";
}

print "(\n";

make_partitions( $options{interval});

print ");\n";


# -----------------------------------------
# functions
# -----------------------------------------

sub make_partitions {
    my ($interval) = @_;
    my $partitions_done=0;
    my $p_year  = $syear;
    my $p_month = $smonth;
    my $p_day = $sday;
    my $func_start = 'to_days('; #$options{function};
    my $func_end = ")";
    if ($options{list}) {
        $func_start = "";
        $func_end = "";
    }
    for my $part ($options{first_partition} .. 
            $options{first_partition} + $options{partitions} -1 ) {
        if ($partitions_done) {
            print ", ";
        }
        else {
            print "  "
        }
        $partitions_done++;

        if ($interval =~ /^\d+$/) {
            printf "partition %s%03d VALUES LESS THAN (%d)\n", 
                $options{prefix},
                $partitions_done,
                ($options{start} + (($partitions_done - 1) * $interval)) + $interval;
        }
        else {
            printf "partition %s%03d VALUES LESS THAN (%s'%4d-%02d-%02d'%s)\n", 
                $options{prefix},
                $partitions_done,
                $func_start,
                $p_year,
                $p_month,
                $p_day,
                $func_end;
            if ($interval eq 'day') {
                ($p_year,$p_month,$p_day) = next_day($p_year, $p_month, $p_day);
            }
            elsif ($interval eq 'week') {
                ($p_year,$p_month,$p_day) = next_week($p_year, $p_month, $p_day);
            }
            elsif ($interval eq 'month') {
                ($p_year,$p_month) = next_month($p_year, $p_month);
            }
            elsif ($interval eq 'year') {
                ($p_year,$p_month) = next_year($p_year, $p_month);
            }
            else {
                die "unsupported interval\n";
            }
        }
    }
    if ($options{'maxvalue'}) {
            printf ", partition %s%03d VALUES LESS THAN (MAXVALUE)\n", 
                $options{prefix},
                ++$partitions_done;
    }
}

sub next_year {
    my ($y, $m) = @_;
    $y++;
    return ($y, $m);
}

sub next_week {
    my ($y, $m, $d) = @_;
    for my $i (1 .. 7) {
        ($y, $m, $d) = next_day($y, $m, $d);
    }
    return ($y, $m, $d);
}

sub next_day {
    my ($y, $m, $d) = @_;
    $d++;
    $m += 0;
    my $last_day = days_in_month($y, $m);
    if ($d > $last_day) {
        $d = 1;
        $m++;
    }
    if ($m > 12) {
        $m = 1;
        $y++;
    }
    return ($y, $m, $d);
}

sub is_leap_year {
    my ($y) = @_;
    if (($y % 400) == 0) {
        return 1;
    }
    elsif (($y % 100) == 0) {
        return 0;
    }
    elsif (($y % 4) == 0) {
        return 1
    }
    return 0
}

sub days_in_month {
    my ($y, $m) = @_;
    $m = $m +0;
    my %last_day = (
        1 => 31,
        2 => 28,
        3 => 31,
        4 => 30,
        5 => 31,
        6 => 30,
        7 => 31,
        8 => 31,
        9 => 30,
        10=> 31,
        11=> 30,
        12=> 31,
    );
    if (($m ==2) and (is_leap_year($y))) {
        $last_day{2} = 29;
    }
    return $last_day{$m};
}

sub days_to_year_end {
    my ($y, $m, $d) = @_;
    my $days = days_in_month($y,$m) - $d +1 ;
    for my $month ( $m +1 .. 12 ) {
        $days += days_in_month($y, $month);
    }
    return $days;
}

sub months_between {
    my ($syear, $smonth,
        $eyear, $emonth) = @_;
 
    my $months =   (12 - $smonth) 
         + ( ($eyear - $syear -1) * 12 )
         + $emonth + 1;
    return $months;
}
 
sub days_between {
    my ($syear, $smonth, $sday,
        $eyear, $emonth, $eday) = @_;
    # print "start $syear, $smonth, $sday\n end $eyear, $emonth, $eday\n";
    my $days =0;
    if (  sprintf ("%4d%2d%2d", $eyear, $emonth, $eday) 
          lt 
          sprintf("%4d%2d%2d", $syear, $smonth, $sday) ) 
    {
        die "start interval > end interval\n";
    }
    while (    ($syear < $eyear) 
            or ( ($syear == $eyear) and ($smonth < $emonth) ) 
            or ( ($syear == $eyear) and ($smonth == $emonth) and ($sday < $eday) ) 
        ) {
        if ($syear < $eyear) {
            $days += days_to_year_end($syear, $smonth, $sday);
            $syear++;
            $smonth=1;
            $sday=1;
        }
        elsif ($smonth < $emonth) {
            $days += days_in_month($syear, $smonth) - $sday;
            ($syear, $smonth) = next_month($syear, $smonth);
            $sday =1;
        }
        elsif ($sday < $eday) {
            $days += $eday - $sday +1;
            $sday = $eday;
        }
    }
    return $days;
}

sub weeks_between {
    my ($syear, $smonth, $sday,
        $eyear, $emonth, $eday) = @_;
   my $days = days_between ($syear, $smonth, $sday,
            $eyear, $emonth, $eday);
    # print $days, "\n"; exit;
    return int ($days / 7) +1;
}

sub next_month {
    my ($y, $m) = @_;
    if ($m  == 12) {
        $m = 1;
        $y++;
    }
    else {
        $m++;
    }
    return ($y, $m);
}

sub get_help {
    my ($msg) = @_;
    if ($msg) {
        warn "[***] $msg\n\n";
    }

    my $HELP_MSG = q{};
    for my $op ( 
                sort { $parse_options{$a}{so} <=> $parse_options{$b}{so} } 
                grep { $parse_options{$_}{parse}}  keys %parse_options  ) {
        my $param =  $parse_options{$op}{parse};
        my $param_str = q{    };
        my ($short, $long ) = $param =~ / (?: (\w) \| )? (\S+) /x;
        if ($short) {
            $param_str .= q{-} . $short . q{ };
        } 
        $long =~ s/ = s \@? / = name/x;
        $long =~ s/ = i / = number/x;
        $param_str .= q{--} . $long;
        $param_str .= (q{ } x (40 - length($param_str)) );
        my $text_items = $parse_options{$op}{help};
        for my $titem (@{$text_items}) {
            $HELP_MSG .= $param_str . $titem . "\n";
            $param_str = q{ } x 40;
        }
        if (@{$text_items} > 1) {
            $HELP_MSG .= "\n";
        }
        # $HELP_MSG .= "\n";
   }

   print credits(),
          "syntax: $PROGRAM_NAME [options] \n", 
          $HELP_MSG;
    exit( $msg ? 1 : 0 );
}

sub credits {
    my $CREDITS = 
          qq(    The Partition Helper,  version $VERSION\n) 
        . qq(    This program creates a ALTER TABLE statement to add or reorganize\n )
        . qq(    partitions for MySQL 5.1 or later\n)
        . qq(    (C) 2008-2010 Giuseppe Maxia\n);
    return $CREDITS;
}

sub show_options {
    printf "# %-20s %-20s %s\n", 'options', 'default', 'value';
    printf "# %-20s %-20s %s\n", '-' x 20, '-' x 20, '-' x 20;
    for my $op ( sort { $parse_options{$a}{so} <=> $parse_options{$b}{so} }
            keys %parse_options) {
        my $v = $options{$op};
        my $d = $parse_options{$op}{value};
        printf "# %-20s %-20s %s\n", 
                $op, 
                '(' . (defined $d ? $d : '') . ')',  
                defined $v ? $v : '' ;
    }
    print '# ', '-' x 62, "\n";
}

Friday, November 28, 2008

Yet another replication trap

When I filed Bug#39197 replication breaks with large load with InnoDB, flush logs, and slave stop/start, I genuinely thought that it was a serious problem. I was a bit puzzled, to tell the truth, because the scenario that I was using seemed common enough for this bug to be found already.

Anyway, it was verified independently, but there was a catch. The script in the master was using SET storage_engine=InnoDB to create the tables necessary for the test. That looked good enough to me. The script was indeed creating InnoDB tables on the master. The trouble was that the "SET" command is not replicated. Thus the slaves were using the default storage engine (i.e. MyISAM). That reduces the scenario to a case were I was splitting a transaction and trying to apply it to a non transactional table. Not a bug, as Andrei Elkin and Sveta Smirnova correctly found out, but not an easy error to spot either.
According to the manual, using the storage_engines variable allows for a certain degree of flexibility in replication. The non-replication of this instruction that affects DDL commands is described as a feature. In my experience, it's a bug, by design even, but this is my personal opinion.

Wednesday, November 26, 2008

Using MySQL 5.1 in production for over a year

Lenz has just published an interview to Adam Donnison, Senior Web Developer in the MySQL web team. Under his watch, the site at mysql.com has been powered by MySQL 5.1 for one and half years.
In production! Talk about eating your own dog food!
In addition to the database server, Adam has spearheaded the test of Enterprise tools and the Query Analyzer. Excellent example, Adam!

Monday, November 17, 2008

Meeting MySQL community in Paris - November 18, 2008

I will meet the MySQL community in Paris on November 18, 2008 at 19:30.
I will speak about MySQL Proxy Wizardry, and discuss with the community about any subject that will come up.
The place:
1 rue royale, 227 Bureaux de la Colline
92210 Saint-Cloud
Bâtiment D 9ème étage
Tel : 01 78 15 24 00

Sunday, November 16, 2008

OpenSQLCamp - Good Job, Baron!

Kudos to Baron Schwartz, who has shown everyone that you can organize a conference from scratch, without the logistics of a company. He has motivated other people in the community and received spontaneous help, which in turn became a web site, sponsor donations, free food for the participants, T-shirt, bags, internet connection, and a glorious feeling of camaraderie throughout the event.
Well done, Baron!

The picture shows a surprised Baron holding a T-shirt covered with "thank you Baron"s, secretly signed by all participants and finally awarded to him in public.

Saturday, November 15, 2008

OpenSQLCamp - Pictures from the first day

The OpenSQLCamp started yesterday in Charlottesville, VA.
As expected, the gathering is impressive. There are many well known names from the open source database world. MySQL is probably overrepresented, but that's fair, considering its wide adoption.

Yesterday was an informal "meet anyone and let's see what we do tomorrow". The schedule, as you can see, is very dynamic.

Friday, November 14, 2008

At the Open SQL Camp

Just arrived to Charlottesville, VA, with Dups.
We drove from Baltimore, MD, where we had an enjoyable MySQL meetup yesterday. The room was small, but packed to capacity with open source enthusiasts.
Now waiting for the "opening ceremony" at 6pm, when we will fine tune tomorrow's schedule. The list of attendees is impressive. More than 100 people are registered. It seems to be a very interesting gathering.

Wednesday, November 12, 2008

circular replication - Will you use it?

Three unrelated facts reminded me of a popular article about MySQL replication that I wrote long time ago.

A reader of that article told me that he used information I wrote to set up a circular replication in production, and it is still working!
A new page on circular replication in MySQL manual was just published.
During yesterday's meetup in New York, an attendee asked advice about broken replication in a circular scheme, and I suggested reading all the caveats that I listed in my article.

I recall reading in High Performance MySQL 2nd edition that circular replication is not recommended, and I agree on the reasons. I wrote that article as it was the result of my research, but I don't recommend circular replication either.
All of the above lead to the questions:

Who is using circular replication in production?
Who prefers circular replication to more robust methods, and why?

Baltimore MySQL meetup - November 13

The Baltimore MySQL Meetup is scheduled for November 13^th at 6pm
The place:
Lotame Solutions, Inc.
6085 Marshalee Drive
Suite 210 Elkridge, MD 21075

We'll talk about MySQL Proxy, and anything else that looks suitable, depending on the crowd's requests.
Thanks to Greg Haase for organizing.

Tuesday, November 11, 2008

New ideas about MySQL Sandbox at the Boston meetup

Presenting a topic to a new audience is a two ways learning experience. The attendees learn what the presenter came prepared to say, and in presence of an attentive audience, the presenter can learn something on the spot.
During today's Boston MySQL meetup, I got two interesting ideas aboou how to improve MySQL Sandbox.

Using the Sandbox as a test for the future production environment is useful. But it would be even more useful if there were a simple way of transferring the contents of a well tested Sandbox to the main server, or transferring the setup of a successfully tuned replication scheme to several production servers. Currently, this can't be done easily, but I have a few ideas worth checking.

The Sandbox can provide one or more servers for development and tests. Every server can be created by a knowledgeable administrator. It would be useful to have a sandbox provider that can create a sandbox on the spot, with an automatic check about the port being available. So the tester can simply ask for another server of a given version, and the Sandbox will be created with a non-conflicting port. There are several ways of implementing this idea. Perhaps the easiest one, suggested by an attendee, is to register the used ports.

Both these ideas are worth exploring. If you happen to care about them and have a good idea on their implementation, please contact me, or simply comment to this post. Thanks to all the ones who have brainstormed about these issues!

Monday, November 10, 2008

New York MySQL meetup - November 11

The New York MySQL Meetup is scheduled for November 11^th at 6pm
The place: Suspenders Restaurant
111 Broadway
at Thames St New York, NY 10006
212-732-5005

We'll talk about MySQL Proxy, and anything else that looks suitable, depending on the crowd's requests.
Thanks to Philip Antoniades for the organization.

Sunday, November 09, 2008

Boston MySQL meetup - November 10

The Boston MySQL meetup is held on November 10^th at 7pm. The place is
MIT Building E51
4 Amherst St
Room 372 Cambridge, MA 02142
857 205 9786

We'll talk about MySQL Sandbox, and about any relevant MySQL topic.

On the road with Dups

Now that Dups is the Community Relations Manager for North America, we'll start working together in the most active way. Next week, we'll be busy at meetups in New York and Baltimore, and then it will be full immersion in Charlottesville for the Open SQL Camp.

Saturday, November 08, 2008

Italians strike back

What follows is the text of my "Swedish" blog.
Look mom, I am blogging in Swedish!
I am starting a blog in Swedish. Now, why should I do such a thing? I usually blog in English, and I don't even bother blogging in my native Italian. Most importantly, I do not speak Swedish at all. So, why bother?
My friend and colleague Kaj Arnö had the gall of starting a blog in Italian, a fact that surprised me and made me envious at the same time.

He started talking about Italy, a country he visits often and he loves. I have little experience of Finland, having visited only once, but where I enjoyed warm hospitality.
I Visited Kaj in his homeland, in Nagu, a charming island West of Turku. The weather was not warm, but friendship and good food was a suitable compensation.

I am choosing to blog in in Swedish because Kaj's mother tongue is Swedish, although he lives in Finland. During my visit to Nagu and other island of the Finnish archipelago, I heard little Finnish and plenty of Swedish.

This experiment wants to show that anyone can blog in (mostly) any language, given the right tools. Hello Kaj!

Monday, November 03, 2008

A quick usability hack with partitioning

A few days ago I was describing a common grievance when using partitions.
When you care at a table, like the following

CREATE TABLE t1 ( d DATE ) 
PARTITION by range (to_days(d))
(
  partition p001 VALUES LESS THAN (to_days('2001-01-01')) 
, partition p002 VALUES LESS THAN (to_days('2001-02-01')) 
, partition p003 VALUES LESS THAN (to_days('2001-03-01'))
);

Then you have the problem of finding out the original values. SHOW CREATE TABLE doesn't help.

show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `d` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (to_days(d)) (PARTITION p001 VALUES LESS THAN (730851) ENGINE = MyISAM, PARTITION p002 VALUES LESS THAN (730882) ENGINE = MyISAM, PARTITION p003 VALUES LESS THAN (730910) ENGINE = MyISAM) */

I spent some time writing a MySQL Proxy module that, among other things, gives you the original values with SHOW CREATE TABLE. However, I was mostly reinventing the wheel in this case, because there is some useful data in the information_schema. Look:


select 
  PARTITION_NAME, 
  PARTITION_DESCRIPTION 
from 
  information_schema.PARTITIONS 
where 
  table_schema ='test' and table_name='t1';
+----------------+-----------------------+
| PARTITION_NAME | PARTITION_DESCRIPTION |
+----------------+-----------------------+
| p001           | 730851                | 
| p002           | 730882                | 
| p003           | 730910                | 
+----------------+-----------------------+

This is as informational as the output of SHOW CREATE TABLE, but having the data in a table, allows us to do the trick.


select 
  PARTITION_NAME , 
  from_days(PARTITION_DESCRIPTION) AS original_value 
from 
  information_schema.PARTITIONS 
where 
  table_schema ='test' and table_name='t1';
+----------------+----------------+
| PARTITION_NAME | original_value |
+----------------+----------------+
| p001           | 2001-01-01     | 
| p002           | 2001-02-01     | 
| p003           | 2001-03-01     | 
+----------------+----------------+

More on this subject when I finish working on my Proxy partition helper.

Friday, October 31, 2008

Scary reading advice

Inspired by the Arctic Dolphin's scary movies, I dug out some reading advice that I made during the MySQL Developers meeting in Riga.

Jack Kerouac, On The RoadMap, a fictionalized autobiography of a senior engineer on a cross country bohemian odyssey after failing to deliver 5.1 GA for the fifth time.

Ken Kesey, One flew over the cuckoo's nested query. A story of love and madness involving a quiet Engineering department. When Jeffrey McMurphy, a convicted felon who's faking insanity to escape a prison sentence, is sent to the Database Group asylum, life changes instantly for the other guests.
A subtle tragedy involving a marketing planner, a virgin junior engineer, and a swordfish develops inexorably, leaving the main character with a headache, while the chief of engineering runs away with the bath sink.

Umberto Eco, The name of the release. Young novice Zack of Melk tells the story of his master, Friar Marten of Baskerville, faced with the sudden suicide of his beloved Five-One-Twenty-Two release. The mystery thickens when another release, Five-One-Twenty-Three, is found dead in a puddle of blood, and again when the scribe Five-One-Twenty-Four is poisoned. Three more releases are slaughtered before some light is finally shed on the story, and the murder is revealed, and the surviving release Five-One-Thirty is finally sent to preach in the outside world.

Thursday, October 30, 2008

Partitions usability I - blues and a preview

Pop quiz. Look at the picture below. Quick: can you tell me to which year and month partition P001 refers?

I couldn't say it without asking the database itself to revert the output of to_days:


select from_days(723180);
+-------------------+
| from_days(723180) |
+-------------------+
| 1980-01-01        | 
+-------------------+

Just to make the pain more clear, you do something like this, clean and neat, and you think you have done a good job.


CREATE TABLE t1 ( d DATE ) 
PARTITION by range (to_days(d))
(
  partition p001 VALUES LESS THAN (to_days('2001-01-01')) 
, partition p002 VALUES LESS THAN (to_days('2001-02-01')) 
, partition p003 VALUES LESS THAN (to_days('2001-03-01'))
);

And then, what the database server is keeping, is really hard to use.


show create table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `d` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE (to_days(d)) (PARTITION p001 VALUES LESS THAN (730851) ENGINE = MyISAM, PARTITION p002 VALUES LESS THAN (730882) ENGINE = MyISAM, PARTITION p003 VALUES LESS THAN (730910) ENGINE = MyISAM) */

If you made a mistake and you look at the code, you will need to convert all the values using FROM_DAYS to make sure that the dates were as intended.
An additional pain is caused by the unbroken long line of partition definition. (Bug#14326). A patch was created, and it should eventually be applied soon.

New features around the corner

Well, not really around the corner, but there is hope that this enhancement will be available soon.
The change is conceptually simple, but momentous for users. You will be allowed to use native data types, without need of converting them to integers.
Here's how the above DDL will be rewritten:


CREATE TABLE t1 ( d DATE ) 
PARTITION by range column_list(d)
(
  partition p001 VALUES LESS THAN (column_list('2001-01-01')) 
, partition p002 VALUES LESS THAN (column_list('2001-02-01')) 
, partition p003 VALUES LESS THAN (column_list('2001-03-01'))
);

SHOW CREATE Table t1\G
*************************** 1. row ***************************
       Table: t1
Create Table: CREATE TABLE `t1` (
  `d` date DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 /*!50100 PARTITION BY RANGE  COLUMN_LIST(d) 
(PARTITION p001 VALUES LESS THAN ( COLUMN_LIST('2001-01-01')) ENGINE = MyISAM,
PARTITION p002 VALUES LESS THAN ( COLUMN_LIST('2001-02-01')) ENGINE = MyISAM,
PARTITION p003 VALUES LESS THAN ( COLUMN_LIST('2001-03-01')) ENGINE = MyISAM) */

Not only you will use native data types without conversion, but the partition definition will be human readable. If you made a mistake somewhere, you will find out without need for recalculations.
When is the above marvel going to be available? Now!, with a Launchpad tree. Although I can't say when it will be released. It won't be in 5.1, because it's almost GA. It could be in a quick point release (5.2) or a major release (6.0).
And what about the error prone procedure of creating partitions? If I need to make monthly partitions from 1985 to 2008, doing that manually is going to be a pain, and a source of errors.
Stay tuned. I have a solution handy.
Till next time.

MySQL University - Quick and easy testing with MySQL Sandbox

As announced by Stefan Hinz, our tireless documentation leader, we are using a new system for MySQL University.
With DimDim, we can have more fluid and interactive presentations. Unlike the previous homegrown system, you don't need to download the slides, which will be shown to you by the presenter. There is no sound delay, and attendees can interact with the presentation by (integrated) chat and by graphical tools.

Today, October 30, 2008, at 15:00 CET, I will present a session on Quick and easy testing with MySQL Sandbox.
There may be some compatibility problems, depending on your browser and OS of choice. So, please join the presentation a few minutes earlier.
The session will be open at 14:45 CET [Rome] (13:45 UTC [London], 09:45 EDT [New York]).

Wednesday, October 29, 2008

MySQL 5.1.29 - Last RC - small changes

MySQL 5.1.29 is available.
This is the last RC. Yes. You heard me right. The long wait is almost over. The next release will be GA.
Sharpen your tools, and get ready to use partitions, row-based replication, and the event scheduler in production without that uneasy sense of guilt that you feel when using non-GA software.

There are a few small changes in MySQL 5.1.29.

SHOW PROFILES

It was already in a preview, but now SHOW PROFILES is available in 5.1 binaries. As you may recall, it is not active by default. It becomes active when the session variable profiling is set.
This feature is going to be very useful when measuring the impact of partitioning or RBR on your queries. I will do some testing soon.

Logging

Log on demand is one of the main features of MySQL 5.1. It means that you can enable and disable general logs and slow query logs without restarting the server. You can also change the log file names, again without a restart.
What was missing was the ability of setting a log file name in the options file, without actually starting the logging. The old option log, could set the general log file name, but it will also start the logging immediately. If you want to set the log file name without activating the logging, now you can use general_log_file=filename or slow_query_log_file=file_name in the options file. These features were already available as system variables, but not as startup options.

Row-base logging safety

From this version, setting the binlog format for a session will require the SUPER privilege.
This change was decided to make it consistent with similar events. If you want to stop or enable the binary log, you can use SET SQL_LOG_BIN=0, but only if you have the SUPER privilege. Both suspending the binary logging and changing its format are affecting replication, and as such, the SUPER privilege is required.

For the complete list of changes affecting MySQL 5.1.29, please look at the manual.

Sunday, October 19, 2008

An odd spot in relay slaves

A relay slave is a slave that writes to its binary log the stream of commands received from the master, thus enabling other slaves to replicate.
This is often used to relieve the master from heavy traffic created by many slaves asking for binlog events.

A typical scenario is when a master has a few dozen slaves, and instead of dealing with all of them directly, uses four relay slaves, each one dealing with 6 slaves.
So, where's the trick? The trouble comes if you change replication format after you start the slave.
Example. One master (M), two relay slaves (R1, R2), with four slaves each (S1,S2,S3,S4, S5, S6, S7, S8).
You started working in MIXED mode, which was set in the option file of each server.
Now you want to switch to STATEMENT mode for a while.
Then, you connect to each of the 11 servers, one by one, with a script that sends to each one the command SET GLOBAL binlog_format=STATEMENT.
You close all open connections from your clients, and before issuing your command, you again connect to all servers, and run SELECT @@binlog_format to make sure that all servers have the same setup.
Your special query is a tracking command, which inserts @@server_id in a table, and in each server the table should have the server_id of its host;
However, you notice that the propagation happens until the relay slaves. All the leaf nodes have the same values of their master.
What happened? It's something that is somehow documented, but results in an unexpected result. A SET GLOBAL statement is valid for every new connection. Meaning that, if you have an open connection, the global behavior doesn't apply. In our example, we closed all client connections. But there was one hidden connection that was not closed.
The slave SQL thread was already running, and was thus unaffected by the change.
To make the change stick, and thus propagate to all the slaves, we need to start and stop the slave.
I open Bug#40106 about this behavior because, although it's is logically explainable, it goes against the user's expectation. (If I change the binlog format, I expect to affect the binary log. Stopping the slave does not occur naturally. Perhaps this is a question for the certification exam).

Friday, October 10, 2008

Community at work - SIGNAL

Long time ago, I saw a blog post by Jorge Bernal, with a simple implementation of SIGNAL for MySQL stored procedures. If you have ever tried to write MySQL stored procedures, you know how dearly missed is this feature.
I discussed this feature internally, and everyone told me "don't bother, we're going to implement SIGNAL in MySQL 6.1". And indeed, the full implementation for SIGNAL and RESIGNAL is in the roadmap.

What does that mean? Should we wait two or three more years to have a SIGNAL feature? Wouldn't it be nice to have a minimal SIGNAL working today, and get the full thing when it's ready?

A bird in the hand ...

I would like to have SIGNAL available today, and therefore I tested Jorge's patch in the latest 5.1.28 code. It needs some more twisting to get it working. If you want to try it, make sure you have the current patch.
Once you apply the patch and compile the code, you will have a server that accepts a simple "SIGNAL" command, without any parameter.
Here's a simple usage example:
First, you create two stored procedures:


delimiter //

drop procedure if exists less_than_10 //
create procedure less_than_10(i int )
deterministic
begin
    if ( i < 10)
    then
        set @comment = concat('a valid number was entered (', i,')');
    else
        set @comment = concat('number too high (', i,')') ;
        SIGNAL ;
    end if;
    select concat('everything fine [',@comment,']')  as comment;
end//

drop procedure if exists trapping //
create procedure trapping ()
deterministic
begin
    declare mystatus varchar(20) default 'ok';
    declare continue handler for SQLSTATE '38503'
        set mystatus = 'ERROR';
    call less_than_10(7);
    select mystatus, @comment;
    call less_than_10(17);
    select mystatus, @comment;
    select 'and this is after the error';
end //

delimiter ;

And then we'll try out the simple one, by calling directly the procedure with a SIGNAL.


call less_than_10(2);
+--------------------------------------------------+
| comment                                          |
+--------------------------------------------------+
| everything fine [a valid number was entered (2)] |
+--------------------------------------------------+
1 row in set (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

select @comment;
+--------------------------------+
| @comment                       |
+--------------------------------+
| a valid number was entered (2) |
+--------------------------------+
1 row in set (0.00 sec)

call less_than_10(12);
ERROR 1340 (38503): Exception generated from user-defined function/procedure

select @comment;
+----------------------+
| @comment             |
+----------------------+
| number too high (12) |
+----------------------+
1 row in set (0.00 sec)

next, we call the procedure that traps the exception. This one has full control of the execution flow, with SIGNAL and HANDLER.


call trapping();
+--------------------------------------------------+
| comment                                          |
+--------------------------------------------------+
| everything fine [a valid number was entered (7)] |
+--------------------------------------------------+
1 row in set (0.00 sec)

+----------+--------------------------------+
| mystatus | @comment                       |
+----------+--------------------------------+
| ok       | a valid number was entered (7) |
+----------+--------------------------------+
1 row in set (0.00 sec)

+----------+----------------------+
| mystatus | @comment             |
+----------+----------------------+
| ERROR    | number too high (17) |
+----------+----------------------+
1 row in set (0.00 sec)

+-----------------------------+
| and this is after the error |
+-----------------------------+
| and this is after the error |
+-----------------------------+
1 row in set (0.00 sec)

This less than 30 lines patch is much better than many hacks that we've seen published for years.
What's your take? Would you like to have this feature in MySQL 5.1, while the full thing is being developed for 6.1?

Tuesday, October 07, 2008

Preparing for Open SQL Camp

The Open SQL Camp will take place in Charlottesville, VA, USA, on November 14, 15, and 16.
Attendees are requested to register in the event's Wiki, and if you are interested in presenting something, there is a mailing list to discuss your intended topics.

I have proposed a topic about the MySQL community driven replication monitoring project, a hackish solution to the old problem of replication monitoring, implemented entirely on the server side, without using system applications.

Monday, October 06, 2008

Using the event scheduler to purge the process list

Two of the most common tasks for database administrators are cleaning the process list from unresponsive queries and remove idle connections that are filling the connection pool.
Both tasks are related to poor usage of the database. In a perfect world, users would only run queries designed, tested, and benchmarked by the DBA or the project manager, and the application servers would never allocate more connections than planned.

But users are human, and an unpredictable amount of unplanned events can happen everywhere. When I was consulting, the above cases were quite common.
Before MySQL 5.1, the only method to clean up the process list was by hand, or using a cron job to do it from time to time.
MySQL 5.1 introduces the event scheduler, and now you can run scheduled stored routines without need of external applications.
I created two stored procedures that read the processlist and kill a process if the execution time is longer than 200 seconds or if a process is idle for longer than 200 seconds. The script with the stored routines and the associated events is in MySQL Forge.

drop procedure if exists purge_slow_queries;
drop procedure if exists purge_idle_connections;
drop event if exists auto_purge_slow_queries;
drop event if exists auto_purge_idle_connections;

delimiter //

create procedure purge_idle_connections()
deterministic
begin
   declare done boolean default false;
   declare max_time int default coalesce(@max_kill_time, 200);
   declare pid bigint;
   declare c cursor for
   SELECT id
   FROM information_schema.processlist
   WHERE command in ('Sleep')
        -- add more conditions here
        AND time > max_time;
   declare continue handler for not found
       set done = true;
   open c;
   set @q_kill = 'KILL ?';
   prepare q_kill from @q_kill;
   PURGELOOP: loop
       fetch c into pid;
       if done then
           leave PURGELOOP;
       end if;
       set @pid = pid;
       execute q_kill using @pid;
   end loop;
   deallocate prepare q_kill;
end//

create procedure purge_slow_queries()
deterministic
begin
   declare done boolean default false;
   declare max_time int default coalesce(@max_kill_time, 200);
   declare pid bigint;
   declare c cursor for
   SELECT id
   FROM information_schema.processlist
   WHERE state in ('executing')
        -- add more conditions here
        AND time > max_time;
   declare continue handler for not found
       set done = true;
   open c;
   set @q_kill = 'KILL ?';
   prepare q_kill from @q_kill;
   PURGELOOP: loop
       fetch c into pid;
       if done then
           leave PURGELOOP;
       end if;
       set @pid = pid;
       execute q_kill using @pid;
   end loop;
   deallocate prepare q_kill;
end//

delimiter ;

create event auto_purge_idle_connections
   on schedule every 10 second
   do call purge_idle_connections();

create event auto_purge_slow_queries
   on schedule every 10 second
   do call purge_slow_queries();

Notice that you can disable an idle connection by setting the variable interactive_timeout, but this method allows you to be more precise. For example you can state that only idle connections to a given database should be killed, or only the ones from a given user.

UPDATE It looks like there is prior art in the same department. My colleague Matthew Montgomery beat me to it loooong time ago.

Is the Federated engine useful to you?

Followers of my blogs and talks know that I am partial to the Federated engine. Not much as a normal way of storing data, but more as an auxiliary device to create innovative schemes.
A few years ago I wrote an article, Federated: the missing manual where I listed most of the gotchas that you can meet when using the Federated engine. This article alone would be enough to discourage you from using Federated in production. If you attempt to use a Federated table like a normal one, your server will suffer, and possibly crash.

Add the sad fact that the Federated engine is riddled with bugs, and you will agree with the decision of disabling the engine by default in MySQL 5.1.
Indeed, using federated tables as normal ones is a recipe for disaster. Yet, a measured usage of Federated tables can greatly improve your database usage experience.
For example, one of my former customers has had a Federated table in production since 2005. The production database is 4.1. A data warehouse database used 5.0, with a Federated table linking to a production table that grows very quickly. Using a cron job, the federated table copies data regularly from the production table to an archive table, and deletes the records older than three months.
Another usage that I have been experimenting with for a few years is using Federated tables as monitoring devices, with small records being passed around between servers. You can use this scheme to get a quick statistics of what another server is doing, for example by linking the federated table to a well tuned view. Or you can use small federated tables to check the health of a master from a slave and vice versa. I used this system for my MySQL Replication Monitor.
In short, Federated is not to be discarded, but it must be used with extreme caution.
The best course of action would be refactoring the whole engine with a more robust architecture. Until that happen, caution and planning would suffice.

Sunday, October 05, 2008

30 terabyte per night

The Large Synoptic Survey Telescope is making news once more. Its projected data inflow of 30 Terabyte per night has caught to the imagination of slashdot readers. Why is this interesting? Because it was not news to me.

You may recall that I was enthusiast about a Astronomy, Petabytes, and MySQL at the MySQL Users Conference 2008, and with reason. The engineers at Stanford have a plan of storing petabytes of data into a cluster of MySQL databases.
The technical session was complemented by an exciting keynote on The Science and Fiction of Petascale Analytics, which closed the conference with a bang.
Hey! that's innovation! Anyone has a massive data storage to tell at the next Users Conference?

Thursday, October 02, 2008

Using partitions to create constraints

A devilish quiz by Scott Noyes has made me thinking about a side effect of partitioning.

The quiz

Given a table trickytable with a INT field, explain how these statements and results are possible:

INSERT IGNORE INTO trickyTable (id) VALUES (1);
Query OK, 0 rows affected (0.00 sec)

SELECT * FROM trickyTable;
Empty set (0.00 sec)

There are no blackhole, federated, triggers, proxy.
I initially tried with the event scheduler, with an event that empties the table every second, but the insert results in 1 row to be inserted, while the quiz asked for 0 rows.

After a few minutes, I got it. The right answer was to use partitioning.

Let me show off a bit. :)
Table trickytable was created with partitions by range or list , with a partition that does not include 1.

Either of the following will do the trick:

create table trickytable (id int)
partition by list (id) (partition p1 values in (2));

#or

create table trickytable (id int)
partition by range (id) (partition p1 values less than (1));

If you insert "1" in this table, without the IGNORE keyword, you get an error:

INSERT INTO trickytable (id) VALUES (1);
ERROR 1526 (HY000): Table has no partition for value 1

Using IGNORE means "don't react to errors". This is used, for example, when inserting duplicated values, to have them discarded without triggering errors.
Thus, adding the IGNORE keyword will discard the error, leaving no rows to be inserted. Thus the "0 rows affected" message and the resulting empty table.

Well spot, Scott!

Practical use

Using the same concept, we can use partitioning to enforce a constraint.
For example, let's assume that we have a table with part_number and category_id, and we want only products belonging to categories 10, 25, 40, and 53 to be in this table.
Using a partition by list does the trick nicely:

create table special_products (
 part_no int not null,
 category_id int not null,
 primary key (part_no, category_id)
) engine = InnoDB
partition by list (category_id)
(
 partition p1 values in (10, 25, 40, 53)
);
insert into special_products values (1, 53), (2, 25);
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

insert into special_products values (1, 23);
ERROR 1526 (HY000): Table has no partition for value 23

If we want to include category 23 into the list, we can use REORGANIZE PARTITION

alter table special_products
reorganize partition p1 into
(
 partition p1 values in (10, 23, 25, 40, 53)
);

Or we can actually add more partitions

alter table special_products
reorganize partition p1 into
(
 partition p1 values in (10, 23, 25),
 partition p2 values in (40, 53)
);

This is an efficient way of adding a constraint (within the limits of partitioning, which accept only integer values and a fixed set of functions).

Wednesday, October 01, 2008

My proposals for the Users Conference 2009

In addition to participating to the reviewing committee for the Users Conference 2009, I have also ambitions of being there as a speaker. Thus I am thinking about the topics I could talk about.
Being the theme of the conference "innovation everywhere", I will not propose any of my previous talks (as I hope everyone else will, since I will strike all the carbon copy proposals).

I have some ideas already.

A tutorial on partitions: Since I have done much testing and writing on partitioning, I feel the need of presenting something in public. Performance, administration, use cases, are all aspects of partitioning with which I am well acquainted.
Row-based replication: Another 5.1 feature that I have been testing at length. It is perhaps too narrow to deserve a tutorial, but I have material for a couple of good sessions.
The Community Replication Monitor: This is a project that has started several years ago as pure speculation, and it is now being shaped into something that can be used.
Testing with MySQL Sandbox: I have applied many changes to MySQL Sandbox since my last presentation, and it is now time to show some cookbook cases.
Community hacks: many interesting community contributions are entering my radar. While I don't know which ones would make their way to the official release, I would like to show how life could be if we put all these clever additions together.
using Bazaar and Launchpad to develop a small project: Unlike previous presentations about Launchpad and Bazar and MySQL or Drizzle, this is a small project oriented session, where I show how to use Launchpad to plan, develop, and maintain a moderately complex application, with blueprints, revision code system, bug tracking, and forums.

Your comments will be highly appreciated. Especially if you suggest something not in this list (and falls within my abilities, of course!)

Tuesday, September 30, 2008

A cool idea - Revision engine

Yesterday I saw an announcement in the MySQL internals@ list, about a new storage engine being released. DDengine has created a revision engine, a sort of embedded proxy inside MySQL that keeps track of the changes you do to your data.
The idea is clever. You write to your table, update and delete without any concern, and the revision engine stores your changes in the background.
I wanted to tried the engine on my laptop (Mac OSX), but there was no binary available for this architecture. I contacted the authors and I received prompt assistance until I had the product installed. Unfortunately, it crashes immediately. Well, it's to be expected for a version 0.1.
I then tried on Linux, and also here I received very quick assistance from Peter Benjamin Volk, Project Head at DDengine.
It now works on Linux, although it's very basic at the moment.
The documentation says that there are two ways of using the revision engine. One with embedded changes, which involves a silent alteration of your table structure to include the revision info, and one with two tables, one of which is created behind the scenes.
For example:

create table t1 (
 id int not null primary key,
 c char(10)
) engine=revision comment="InnoDB:DOUBLE";
show tables;
+----------------+
| Tables_in_test |
+----------------+
| t1             |
| t1_revision    |
+----------------+

desc t1;
+-------+----------+------+-----+---------+-------+
| Field | Type     | Null | Key | Default | Extra |
+-------+----------+------+-----+---------+-------+
| id    | int(11)  | NO   | PRI | NULL    |       |
| c     | char(10) | YES  |     | NULL    |       |
+-------+----------+------+-----+---------+-------+

desc t1_revision;
+--------------------+---------------------+------+-----+
| Field              | Type                | Null | Key |
+--------------------+---------------------+------+-----+
| id                 | int(11)             | NO   | PRI | 
| c                  | char(10)            | YES  |     |
| revision_id        | int(10) unsigned    | NO   | PRI |
| revision_timestamp | timestamp           | NO   |     | 
| revision_deleted   | tinyint(3) unsigned | NO   |     |
+--------------------+---------------------+------+-----+

The "InnoDB:DOUBLE" tells the engine to use a second table for revision info.
Operations on the table are transparent.

insert into t1 (id,c) values (1, 'aaa'), (2, 'bbb');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

insert into t1 (id, c) values (3, 'ccc'), (4, 'ddd');
Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

select * from t1;
+----+------+
| id | c    |
+----+------+
|  1 | aaa  |
|  2 | bbb  |
|  3 | ccc  |
|  4 | ddd  |
+----+------+
4 rows in set (0.00 sec)

select * from t1_revision;
Empty set (0.00 sec)

So far, nothing unusual. Let's try some changes.

update t1 set c ='changed' where id = 3;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

select * from t1;
+----+---------+
| id | c       |
+----+---------+
|  1 | aaa     |
|  2 | bbb     |
|  3 | changed |
|  4 | ddd     |
+----+---------+
4 rows in set (0.00 sec)

show variables like '%revision%';
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| revision_select_mode | current |
+----------------------+---------+
1 row in set (0.00 sec)

select * from t1_revision;
+----+------+-------------+---------------------+------------------+
| id | c    | revision_id | revision_timestamp  | revision_deleted |
+----+------+-------------+---------------------+------------------+
|  3 | ccc  |           1 | 2008-09-30 05:45:49 |                0 |
+----+------+-------------+---------------------+------------------+
1 row in set (0.00 sec)

delete from t1 where id = 2;
Query OK, 1 row affected (0.01 sec)

select * from t1_revision;
+----+------+-------------+---------------------+------------------+
| id | c    | revision_id | revision_timestamp  | revision_deleted |
+----+------+-------------+---------------------+------------------+
|  3 | ccc  |           1 | 2008-09-30 05:45:49 |                0 |
+----+------+-------------+---------------------+------------------+
1 row in set (0.00 sec)

set revision_select_mode = 'deleted';
Query OK, 0 rows affected (0.01 sec)

select * from t1_revision;
+----+------+-------------+---------------------+------------------+
| id | c    | revision_id | revision_timestamp  | revision_deleted |
+----+------+-------------+---------------------+------------------+
|  2 | bbb  |           1 | 2008-09-30 05:47:14 |                1 |
+----+------+-------------+---------------------+------------------+
1 row in set (0.00 sec)

select * from t1;
+----+---------+
| id | c       |
+----+---------+
|  1 | aaa     |
|  3 | changed |
|  4 | ddd     |
+----+---------+
3 rows in set (0.00 sec)

The session variable revision_select_mode changes the revision data that to be shown.
At the moment, there is no revert mechanism (or if there is, it's undocumented), but the idea is cool, and I think that this engine will become very useful.

Monday, September 29, 2008

Will you use row-based replication by default?

MySQL 5.1 introduces row based replication, a way of replicating data that fixes many inconsistencies of the statement based replication, the standard method used by MySQL so far.

The good: row based replication solves some problems when replicating the result of non deterministic functions, such as UUID() or NOW().
The bad: row-based replication may break existing applications, where you count on the quirks of statement based replication to execute conditionally (updates base on @@server_id, for example), and may perform badly on updates applied to very large tables.

Currently, MySQL 5.1.28 has the mixed mode enabled by default. When in this mode, binary logging switches to row-based when there is a risk of breaking replication. Except that there are some cases left out, like Bug#39701 (load_file()).
Mixed mode is supposed to be the best of two worlds. You can run fast updates on large tables and not risking inconsistent results on non-deterministic functions. The recent addition to mysqlbinlog allows an easy decoding of the ugly looking row-based statements.
Would you switch to mixed mode or stay in statement mode? There are requests of reverting the default to statement based binlog format, to maintain compatibility with old applications.
Personally, I would go for the mixed mode. But it depends on your applications and your personal experience. What's your take?

Friday, September 26, 2008

Drizzling MySQL

Have you ever used subqueries with MySQL? It's an addition introduced in version 4.1, and since then neglected. The performance of subqueries in MySQL 4.1 and 5.x is really a sad story.
For example, using the Employees test database, you may try this query:

select
title, from_date, to_date
from titles
where emp_no in
(select emp_no
from employees
where first_name = 'Mary'
      and last_name = 'Sluis'
)

The result, in MySQL 5.1.28 is

+-------+------------+------------+
| title | from_date  | to_date    |
+-------+------------+------------+
| Staff | 1990-01-22 | 1996-11-09 |
+-------+------------+------------+
1 row in set (7.88 sec)

Ugly, uh?
But let's try the same query in MySQL 6.0.8.

select title, from_date, to_date
from titles
where emp_no in (
select emp_no
from employees
where first_name = 'Mary'
and last_name = 'Sluis');
+-------+------------+------------+
| title | from_date  | to_date    |
+-------+------------+------------+
| Staff | 1990-01-22 | 1996-11-09 |
+-------+------------+------------+
1 row in set (0.13 sec)

That's a completely different story! What's happening?
The Optimizer Team has been hard at work, and in version 6.0 we will see a huge improvement. Let's try to explain the result:

explain extended select title, from_date, to_date
from titles
where emp_no in (
select emp_no from employees
where first_name = 'Mary'
and last_name = 'Sluis')\G
*************************** 1. row ***************************
        id: 1
select_type: PRIMARY
     table: employees
      type: ALL
possible_keys: PRIMARY
       key: NULL
   key_len: NULL
       ref: NULL
      rows: 300024
  filtered: 100.00
     Extra: Using where
*************************** 2. row ***************************
        id: 1
select_type: PRIMARY
     table: titles
      type: ref
possible_keys: PRIMARY,emp_no
       key: emp_no
   key_len: 4
       ref: employees.employees.emp_no
      rows: 1
  filtered: 100.00
     Extra:
2 rows in set, 1 warning (0.00 sec)

show warnings\G
*************************** 1. row ***************************
Level: Note
Code: 1003
Message: select `employees`.`titles`.`title` AS
`title`,`employees`.`titles`.`from_date` AS `from_date`,
`employees`.`titles`.`to_date` AS `to_date`
from `employees`.`employees`
join `employees`.`titles`
where ((`employees`.`titles`.`emp_no` = `employees`.`employees`.`emp_no`)
and (`employees`.`employees`.`last_name` = 'Sluis')
and (`employees`.`employees`.`first_name` = 'Mary'))

The optimizer has converted the subquery into a join, with a query plan that is 20 times better than the one used in MySQL 5.1. Try the explain extended command in MySQL 5.1.28, and the resulting query is

select `employees`.`titles`.`title` AS `title`,
`employees`.`titles`.`from_date` AS `from_date`,
`employees`.`titles`.`to_date` AS `to_date`
from `employees`.`titles`
where (`employees`.`titles`.`emp_no`,
(((`employees`.`titles`.`emp_no`)
in employees on PRIMARY
where ((`employees`.`employees`.`last_name` = 'Sluis')
and (`employees`.`employees`.`first_name` = 'Mary')
and ((`employees`.`titles`.`emp_no`) = `employees`.`employees`.`emp_no`)))))

So far, the comparison shows that some subqueries in 6.0 are evaluated much better than in previous versions. But what's the "drizzled" part from the title?
There are two issues, actually.
First, as you may have heard, Drizzle is a lightweight version of MySQL, based on version 6.0 code. If you import the same data and run the previous query in Drizzle, you get the same good performance that I got in 6.0.
Second, it would be nice to have subquery optimization in production. Unfortunately, neither Drizzle nor MySQL 6.0 are GA. They are both alpha. Drizzle has not reached the feature freeze stage yet, and MySQL 6.0 should wait for Falcon, Maria, and the online backup to be ready before being considered stable. So today I made a little experiment. I drizzled MySQL 6.0, by disabling Maria and Falcon, getting binaries that are less than half in size, and skipping a lot of unfixed bugs. If you are interested, I compiled using this script.

#! /bin/sh

path=`dirname $0`
. "$path/SETUP.sh"

extra_flags=" $fast_cflags $max_cflags -g"
extra_configs=" --with-plugins=innobase,archive,blackhole,csv,partition "
extra_configs=" $extra_configs --without-falcon "
extra_configs=" $extra_configs --without-maria-storage-engine "

. "$path/FINISH.sh"

You may ask, what's the usefulness of this stripping down operation?
It's just a small test. I have a private agenda, and I will do something even more dramatic in the coming weeks. In the meantime, please think about this. There is a great feature in 6.0, which could be very nice to use in production. What would you do?
Backport it to MySQL 5.1? Strip down 6.0 leaving only subquery optimization, à la drizzle? I have some ideas, but I would like to hear from you. What would you do?