The Data Charmer: perl

Showing posts with label perl. Show all posts

Monday, December 06, 2010

Excluding databases from mysqldump

A question that came up during the MySQL track at the UKOUG conference in Birmingham was "Can I exclude only a few databases from mysqldump? Let's say that I have 50 databases, and I want to dump all of them, except a few."
As many know, mysqldump has an option to ignore specific tables. SO if you have 1,000 tables in a databases, you can tell mysqldump to dump all the tables except a few ones.
There is no corresponding option to exclude one or more databases.
However, if you know your command line tools, the solution is easy:
First, we get the list of all databases:

mysql -B -N -e 'show databases'
information_schema
employees
five
four
mysql
one
performance_schema
six
test
three
two

-B forces batch mode (no dashes box around the data), while -N gets the result without the headers.
Now, let's say that we want to exclude databases four, five, and six. And since we want to avoid unpleasant side effects, also information_schema and performance_schema.
Thus, we pipe the previous data through a filter. I use Perl, but sed or grep could get the job done.

mysql -B -N -e 'show databases' | \
  perl -ne 'print unless /\b(?:four|five|six|_schema)\b/' 
employees
mysql
one
test
three
two

Now that we have the list of databases that we need, we can tell mysqldump to backup the databases from such list. All we need is converting the vertical list into a horizontal one using xargs

mysql -B -N -e 'show databases' | \
  perl -ne 'print unless /\b(?:four|five|six|_schema)\b/' \
  xargs echo mysqldump -B 
mysqldump -B employees mysql one test three two

That's it. The last line is the resulting command. Once you are sure that it is what you want, remove the "echo" after xargs, and the command will be executed.

Update: Thanks to Shantanu, who pointed that the regexp does not filter properly. So I added the boundary checks (\b) to make my words match the result.

Monday, May 24, 2010

MySQL Sandbox meets plugins

I saw it coming.
In the past year I have been working more and more with plugins. The InnoDB plugins, the semi-synchronous replication plugins, the Gearman UDFs, the Spider plugins, and every time I found myself doing a long installation process, with inevitable mistakes.

So I told myself "I wish I had a tool that installed plugins instantly and painlessly, the way MySQL Sandbox installs a server.
There you go. So yesterday I had enough of suffering and have put together an installation script that does a good job of installing several plugins with little or no effort.

Overview

How does it work? For now, it's a separate script, but it will soon end into SBtool, the Sandbox tool.
Plugins need different operations, and the difficult part is finding a clear way of describing what you want to do, and how. But once you have come up with that set of instructions, there is seldom need to revisit it.
So the principle is th create a set of templates, one for every plugin, where you explain to the installation script what you want to do.
Having installed several plugins repeatedly in several versions of MySQL, I now have a good understanding of the process, and having gone through the motions of explaining the procedure to a Perl script, I feel that I know the process even more. That is, if you want to understand a process, script it. For if you want to script a process, you really need to understand what's going on.

The template

The template was not easy to write. After some bargaining with myself, I decided that the best format was Perl itself.
Let's see, for example, the InnoDB plugin


innodb =>    {
    all_servers => 
        {
        operation_sequence => [qw(stop options_file start sql_commands )],
        options_file => 
            [
            'ignore_builtin_innodb',
            'plugin-load='
             .'innodb=ha_innodb_plugin.so;'
             .'innodb_trx=ha_innodb_plugin.so;'
             .'innodb_locks=ha_innodb_plugin.so;'
             .'innodb_lock_waits=ha_innodb_plugin.so;'
             .'innodb_cmp=ha_innodb_plugin.so;'
             .'innodb_cmp_reset=ha_innodb_plugin.so;'
             .'innodb_cmpmem=ha_innodb_plugin.so;'
             .'innodb_cmpmem_reset=ha_innodb_plugin.so',
            'default-storage-engine=InnoDB',
            'innodb_file_per_table=1',
            'innodb_file_format=barracuda',
            'innodb_strict_mode=1',
            ],
        sql_commands => 
            [
                'select @@innodb_version;',
            ],
        startup_file => [ ],
        },
    },

The first thing that you notice is that there is an all_servers section. This means that any server can get the same treatment, as opposed to the semi-synchronous plugin, where master and slave need different plugins and commands.
Then comes the operation_sequence, where we decide the order of the operations.
Inside options_file we put the commands that we want inside a my.cnf.
The sql_commands section has a list of queries that the script runs when instructed.


semisynch => {
    master => 
        {
            operation_sequence => [qw(stop options_file start sql_commands )],
            options_file => 
                [
                'plugin-load=rpl_semi_sync_master=semisync_master.so',
                'rpl_semi_sync_master_enabled=1'
                ],
            sql_commands => 
                [
                    'select @@rpl_semi_sync_master_enabled;'
                ],
            startup_file => []
        },
    slave => 
        {
            operation_sequence => [qw(stop options_file start sql_commands )],
            options_file => 
                [
                'plugin-load=rpl_semi_sync_slave=semisync_slave.so',
                'rpl_semi_sync_slave_enabled=1'
                ],
            sql_commands => 
                [
                    'select @@rpl_semi_sync_slave_enabled;'
                ],
            startup_file => []
        },
    },

By contrast, the semisynch plugin looks comparatively more complex, with its two sections for master and slave. But as you look closely, you recognize the two operations described in the manual, and you feel that you could deal with them easily.

The script

The script was not much difficult to write. Since it only works with MySQL Sandbox instances, it leverages on the predictability of each server.
There is quite a lot of complexity inside, though, because the script checks every possible source of trouble before actually running the instructions from the template.
The script needs two parameters: a directory containin a sandbox, and the name of the plugin. It expects the plugin definition template (named plugin.conf to be in the destination directory, or in the $SANDBOX_HOME directory.
It recognizes if the target path is a single or multiple sandbox. If it is multiple, it installs the given plugin in every server. It also recognizes if the server is a master or a slave, and pulls the appropriate section from the template when required.


$  perl set_plugin.pl $HOME/sandboxes/rsandbox_5_1_47 innodb
executing "stop" on slave 1
executing "stop" on slave 2
executing "stop" on master
Installing <innodb> in </$HOME/sandboxes/rsandbox_5_1_47/master/>
. sandbox server started
--------------
select @@innodb_version
--------------

+------------------+
| @@innodb_version |
+------------------+
| 1.0.8            |
+------------------+
1 row in set (0.00 sec)

Bye
Installing <innodb> in </$HOME/sandboxes/rsandbox_5_1_47/node1/>
. sandbox server started
--------------
select @@innodb_version
--------------

+------------------+
| @@innodb_version |
+------------------+
| 1.0.8            |
+------------------+
1 row in set (0.00 sec)

Bye
Installing <innodb> in </$HOME/sandboxes/rsandbox_5_1_47/node2/>
.. sandbox server started
--------------
select @@innodb_version
--------------

+------------------+
| @@innodb_version |
+------------------+
| 1.0.8            |
+------------------+
1 row in set (0.00 sec)

Now there is no excuse for testing servers with plugins.
There is still some TODO, most notably testing, fixing conflicts that may happen when two plugins fight for the same plugin-load statement, and integrating with sbtool, as said before. But for now, it is enough.
You can try i, by using the script and the template

Wednesday, May 12, 2010

World map, shaped by MySQL downloads

A few years ago, during the MySQL Conference opening keynote, two world maps of MySQL downloads were displayed. With the lights down, they made an impression.
Oddly enough, to the best of my knowledge, the downloads map has not been drawn again since then. I asked my friend and colleague Markus Popp, and he provided the data from the downloads logs, leaving the implementation to me.
A first attempt with Google Maps API produced a chart that is nice to see for a single country or town, but hardly pleasant for the entire world.

Then, I abandoned the easy path, and looked at CPAN for inspiration, and immediately found something that could solve my problem. Using GD::Map, I quickly created a world map, and after a few minutes of fiddling with the innards, I manage to plot a map that looked like what I wanted.

.
To get the red dots on a black background I simply used Gimp, and soon I had the results I was looking for.

Wednesday, January 20, 2010

Multi dimensional cubes in MySQL through Gearman

I gave two presentations about Gearman at the Linux.conf.au. As part of the preparation for these talks, I created several sample applications. One of them, about remote replication administration, I will cover in a separate post. The most amazing one, which I cover here, is a quick and painless solution for multiple level crosstabs in MySQL.

Some background is needed. Crosstabs (also called data cubes or pivot tables, have been one of my favorite hacks for long time. In 2001 I wrote an article about a simple way of doing single level crosstabs. A few years later, I developed a Perl module that generates multiple levels of data cubes in most any database systems. Since then, I have received countless requests to convert this module to PHP, Python, Java, and I have always declined, for lack of time or abilities.
In the coming years, I tackled the same problem using MySQL Proxy and some SQL hacks. Both attempts were not completely satisfactory. The options offered by the Perl module are simply too hard to replicate to any other system.
When I started using Gearman, I realized that I could use the original Perl module through a Gearman worker, without converting to any other language. The idea is to write a simple worker that accepts some parameters and runs the Perl module to return a crosstab query to the client. The query being the most complicated thing to generate, the architecture could look like the image below.

To take the idea one step further, I used the Gearman UDF for MySQL, which makes the crosstab function available at the SQL level, thus being transparent no matter which programming language the client uses, and without need of using the Gearman API.

In this scenario, what you need to do is just querying the worker (through the UDF), with a simple string of parameters.


mysql> set @q = (select gman_do('crosstab', 
'from=all_personnel;op=sum salary;rows=country;cols=gender'));

mysql> prepare q from @q; execute q;
+---------+-------+-------+-------+
| country | m     | f     | total |
+---------+-------+-------+-------+
| Germany | 16000 | 11000 | 27000 |
| Italy   |  6000 |  6000 | 12000 |
| UK      | 10500 |  NULL | 10500 |
| zzzz    | 32500 | 17000 | 49500 |
+---------+-------+-------+-------+

Here's a taste of a 2 levels cube:

set @q = (select gman_do('crosstab','from=all_personnel;op=count salary;rows=country,location;cols=department,gender'));
Query OK, 0 rows affected (0.03 sec)

prepare q from @q; execute q;Query OK, 0 rows affected (0.00 sec)
Statement prepared

+---------+----------+--------+--------+------+---------+---------+-------+-------+-------+-----+-------+
| country | location | pers#m | pers#f | pers | sales#m | sales#f | sales | dev#m | dev#f | dev | total |
+---------+----------+--------+--------+------+---------+---------+-------+-------+-------+-----+-------+
| Germany | Berlin   |      0 |      1 |    1 |       0 |       0 |     0 |     1 |     0 |   1 |     2 |
| Germany | Bonn     |      0 |      0 |    0 |       1 |       0 |     1 |     0 |     0 |   0 |     1 |
| Germany | Munich   |      1 |      0 |    1 |       0 |       1 |     1 |     0 |     0 |   0 |     2 |
| Germany | zzzz     |      1 |      1 |    2 |       1 |       1 |     2 |     1 |     0 |   1 |     5 |
| Italy   | Rome     |      1 |      0 |    1 |       0 |       0 |     0 |     0 |     1 |   1 |     2 |
| Italy   | zzzz     |      1 |      0 |    1 |       0 |       0 |     0 |     0 |     1 |   1 |     2 |
| UK      | London   |      1 |      0 |    1 |       1 |       0 |     1 |     0 |     0 |   0 |     2 |
| UK      | zzzz     |      1 |      0 |    1 |       1 |       0 |     1 |     0 |     0 |   0 |     2 |
| zzzz    | zzzz     |      3 |      1 |    4 |       2 |       1 |     3 |     1 |     1 |   2 |     9 |
+---------+----------+--------+--------+------+---------+---------+-------+-------+-------+-----+-------+

It would be nice to actually format the result in a more human readable way, like this one, but it will require some more work.


+---------+----------+--------------+-----------------+-------------+-------+
| country | location | pers         |sales            | dev         | total |
|         |          +---+---+------+----+----+-------+---+---+-----+-------+
|         |          | m | f | pers |  m |  f | sales | m | f | dev | total |
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+
| Germany | Berlin   | 0 | 1 |    1 |  0 |  0 |     0 | 1 | 0 |   1 |     2 | 
| Germany | Bonn     | 0 | 0 |    0 |  1 |  0 |     1 | 0 | 0 |   0 |     1 | 
| Germany | Munich   | 1 | 0 |    1 |  0 |  1 |     1 | 0 | 0 |   0 |     2 | 
| Germany | total    | 1 | 1 |    2 |  1 |  1 |     2 | 1 | 0 |   1 |     5 | 
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+
| Italy   | Rome     | 1 | 0 |    1 |  0 |  0 |     0 | 0 | 1 |   1 |     2 | 
| Italy   | total    | 1 | 0 |    1 |  0 |  0 |     0 | 0 | 1 |   1 |     2 |
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+
| UK      | London   | 1 | 0 |    1 |  1 |  0 |     1 | 0 | 0 |   0 |     2 | 
| UK      | total    | 1 | 0 |    1 |  1 |  0 |     1 | 0 | 0 |   0 |     2 |
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+ 
| total   | total    | 3 | 1 |    4 |  2 |  1 |     3 | 1 | 1 |   2 |     9 | 
+---------+----------+---+---+------+----+----+-------+---+---+-----+-------+

To make the above examples work, what's missing is the worker. You can try the sample crosstab worker from MySQL Forge.

Tuesday, December 29, 2009

Filtering mysqldump output

Several people have suggested a more flexible approach at mysqldump output in matter of user privileges.
When you dump the data structure for views, triggers, and stored routines, you also dump the permissions related to such objects, with the DEFINER clause.
It would be nice to have such DEFINER clauses removed, or even replaced with the appropriate users in the new database.

The mysqldump filter was created with this need in mind. It allows you to remove all DEFINER clauses and eventually replacing them with a better one.
For example:


mysqldump --no-data sakila | dump_filter --delete > sakila_simple.sql
mysqldump --no-data sakila | dump_filter --replace='newuser@`10.%`' > sakila_secure.sql

The first example removes all references to DEFINER, while the second one replaces every definer with a new user name.

Update Since MySQL Forge is not available anymore, here's the code:


#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Long qw(:config no_ignore_case );

my $replace = undef;
my $delete  = undef;
my $help    = 0;

GetOptions (
   'replace|r=s' => \$replace,
   'delete|d'    => \$delete,
   'help|h'      => \$help,
) or help('unrecognized options');

help() if $help;

if ($delete and $replace) {
    help( 'you must choose EITHER "delete" OR "replace". Not both');
}

$delete = 1 unless $replace;

while (my $line = ) {
    if ($delete) {
        $line =~ s{(/\*!\d+ )\s*definer\s*=\s*\S+}{$1}i;
    }
    elsif ($replace) {
        $line =~ s{(/\*!\d+\s+definer\s*=\s*)(\S+)}{$1 $replace}i;
    } 
    print $line;
}

sub help {
    my ($msg) = @_;
    if ($msg) {
        print "*** $msg\n";
    }
    print "dump_filter - mysqldump filter \n",
          "(C) Giuseppe Maxia, 2009\n",
          "removes/changes DEFINER clauses from MySQL dumps\n",
          "USAGE: dump_filter [options]\n",
          "    -d|--delete    removes the DEFINER clauses\n",
          "    -r|--replace=s replaces every DEFINER clause with the \n",
          "                   new value provided\n",
          "    -h|--help      This text\n";
    exit(1);
}


__END__
notes: MySQLdump filter.
Removes or replaces the DEFINER clauses from a dump.
It is necessary when you want to migrate a database to a new server with different users.

Monday, December 01, 2008

The partition helper - Improving usability with MySQL 5.1 partitioning

I talked several times about partitioning usability. In my many tests of partitioning I found myself in need of generating list of partitions for a given range.
I made the Partition Helper (see code at the end of this post) to scratch this particular itch, i.e. making partitions reasonably quickly and without thinking too much.

The Partition Helper is a Perl script that converts some simple options into lots of partitions.
The detailed reference manual lists all the options and many examples.
Here, I would like to show just an example.
Say, you want to create partitions in table blog_posts by monthly intervals on column published. Given that you have data for the last two years, you use the partition helper like this:

./partitions_helper --table=blog_posts \
--column=published \
--interval=month \
--start=2007-01-01 \
--end=2009-01-01
ALTER TABLE blog_posts
PARTITION by range (to_date(published))
(
partition p001 VALUES LESS THAN (to_days('2007-01-01'))
, partition p002 VALUES LESS THAN (to_days('2007-02-01'))
, partition p003 VALUES LESS THAN (to_days('2007-03-01'))
, partition p004 VALUES LESS THAN (to_days('2007-04-01'))
, partition p005 VALUES LESS THAN (to_days('2007-05-01'))
, partition p006 VALUES LESS THAN (to_days('2007-06-01'))
, partition p007 VALUES LESS THAN (to_days('2007-07-01'))
, partition p008 VALUES LESS THAN (to_days('2007-08-01'))
, partition p009 VALUES LESS THAN (to_days('2007-09-01'))
, partition p010 VALUES LESS THAN (to_days('2007-10-01'))
, partition p011 VALUES LESS THAN (to_days('2007-11-01'))
, partition p012 VALUES LESS THAN (to_days('2007-12-01'))
, partition p013 VALUES LESS THAN (to_days('2008-01-01'))
, partition p014 VALUES LESS THAN (to_days('2008-02-01'))
, partition p015 VALUES LESS THAN (to_days('2008-03-01'))
, partition p016 VALUES LESS THAN (to_days('2008-04-01'))
, partition p017 VALUES LESS THAN (to_days('2008-05-01'))
, partition p018 VALUES LESS THAN (to_days('2008-06-01'))
, partition p019 VALUES LESS THAN (to_days('2008-07-01'))
, partition p020 VALUES LESS THAN (to_days('2008-08-01'))
, partition p021 VALUES LESS THAN (to_days('2008-09-01'))
, partition p022 VALUES LESS THAN (to_days('2008-10-01'))
, partition p023 VALUES LESS THAN (to_days('2008-11-01'))
, partition p024 VALUES LESS THAN (to_days('2008-12-01'))
, partition p025 VALUES LESS THAN (to_days('2009-01-01'))
);

The above command can be also expressed as

./partitions_helper --table=blog_posts --column=published --interval=month \
--start=2007-01-01 --partitions=25

Or you can shorten it, if you remember the small options:

./partitions_helper -t blog_posts -c published -i month -s 2007-01-01 -p 25

More options are available. You can see them using

./partitions_helper --help

Enjoy!

Update Here's the code to the Partition helper, published here after MySQL Forge was taken down.

#!/usr/bin/perl
#    The MySQL Partitions helper
#    Copyright (C) 2008, 2009 Giuseppe Maxia
#
#    This program is free software; you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation; version 2 of the License
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program; if not, write to the Free Software
#    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA

#
# This program creates a ALTER TABLE statement to add or reorganize 
# date based partitions for MySQL 5.1 or later
#

use strict;
use warnings;
# use diagnostics;
use English qw( ‐no_match_vars ) ;
use Getopt::Long qw(:config no_ignore_case );
use Data::Dumper;

my $VERSION = '1.0.4';

#
# Parse options are fully qualified options with descriptive help,
# parse string for the command line, and sort order for the help
#
my %parse_options = (
    table           =>  {
                            value   => '',
                            parse   => 't|table=s',
                            help    => [
                                        'The table being partitioned',
                                        '(no default)'
                                       ],
                            so      =>  20,
                        },
    column          =>  {
                            value   => '',
                            parse   => 'c|column=s',
                            help    => [
                                        'The partitioning column',
                                        '(no default)',
                                       ],
                            so      =>  30,
                        },
    interval        =>  {
                            value   => 'month',
                            parse   => 'i|interval=s',
                            help    => [
                                        'The interval between partitions',
                                        'Accepted: "year", "month", "week", "day", or a number',
                                        '(default: month) ',
                                       ],
                            so      =>  40,
                        },
    partitions      =>  {
                            value   => 0,
                            parse   => 'p|partitions=i',
                            help    => [
                                        'How many partitions to create',
                                        '(default: 0) ',
                                       ],
                            so      =>  50,
                        },
    first_partition =>  {
                            value   => 1,
                            parse   => 'first_partition=i',
                            help    => [
                                        'the first partition to create',
                                        '(default: 1) ',
                                       ],
                            so      =>  60,
                        },
    reorganize      =>  {
                            value   => '',
                            parse   => 'reorganize=s',
                            help    => [
                                        'the partition(s) to reorganize',
                                        '(default: none) '
                                       ],
                            so      =>  70,
                        },
    
    start           =>  {
                            value   => '2001-01-01',
                            parse   => 's|start=s',
                            help    => [
                                        'the minimum partitioning value',
                                        '(default: 1 for numbers, 2001-01-01 for dates) '
                                       ],
                            so      =>  80,
                        },
    end             =>  {
                            value   => '',
                            parse   => 'e|end=s',
                            help    => [
                                        'The maximum partitioning value',
                                        'Used unless --partitions is used',
                                        '(no default) ',
                                       ],
                            so      =>  90,
                        },
    function             =>  {
                            value   => '',
                            parse   => 'f|function=s',
                            help    => [
                                        'The partitioning function to use in the "range" declaration',
                                        '(default: to_days, unless --list is used) ',
                                       ],
                            so      =>  100,
                        },
    list             =>  {
                            value   => 0,
                            parse   => 'l|list',
                            help    => [
                                        'Use the COLUMNS feature (versions >= 5.5)',
                                        '(default: no) ',
                                       ],
                            so      =>  110,
                        },
     maxvalue       =>  {
                            value   => 0,
                            parse   => 'x|maxvalue',
                            help    => [
                                        'Adds MAXVALUE as last partition',
                                        '(default: disabled) ',
                                       ],
                            so      =>  115,
                        },
    prefix           =>  {
                            value   => 'p',
                            parse   => 'prefix=s',
                            help    => [
                                        'prefix for partition name',
                                        '(default: p) ',
                                       ],
                            so      =>  120,
                        },
    explain          =>  {
                            value   => 0,
                            parse   => 'explain',
                            help    => [
                                        'show the current option values',
                                        '(default: no) ',
                                       ],
                            so      =>  130,
                        },

    version             =>  {
                            value   => 0,
                            parse   => 'version',
                            help    => [
                                        'display the version',
                                        '(default: none) ',
                                       ],
                            so      =>  400,
                        },
    help             =>  {
                            value   => 0,
                            parse   => 'h|help',
                            help    => [
                                        'display the help page',
                                        '(default: no) ',
                                       ],
                            so      =>  500,
                        },
);

# 
# convert parse options to simple options
#
my %options = map { $_ ,  $parse_options{$_}{'value'}}  keys %parse_options;

# 
# get the options from the command line
#
GetOptions (
    map { $parse_options{$_}{parse}, \$options{$_} }        
        grep { $parse_options{$_}{parse}}  keys %parse_options 
) or get_help();

get_help() if $options{help};

if ($options{version}) {
    print credits();
    exit 0;
}


# print Dumper(\%options) ; exit;

my %valid_intervals = (
    day   => 1,
    week  => 1,
    month => 1,
    year  => 1,
);

#
# start and end dates
#
my ($syear, $smonth, $sday) = ();
my ($eyear, $emonth, $eday) = ();

#
# deals with placeholder features
#

for my $op ( qw(operation) ) {
    if ($options{$op}) {
        die "option <$op> is not implemented yet\n";
    }
}

# 
# check that a table and column are given
#
unless ($options{table}) {
    die "table name required\n";
}

unless ($options{column} or $options{reorganize} ) {
    die "column name required\n";
}

#
# accept only one of --end or --partitions
#
if ($options{end} && $options{partitions}) {
    die "too many quantifiers. Use EITHER '--partitions' OR '--end' \n";
}

#
# check that we parsed a valid interval
#
if ( $options{interval} =~ /^\d+$/) {
    unless ($options{start} =~ /^\d+$/) {
        $options{start} = 1;
    }
    if ($options{end}) {
        unless ($options{end} =~ /^\d+$/) {
            die "the end value must be a number\n";
        } 
        if ($options{end} < $options{interval}) {
            die "the end value must be bigger than the interval\n";
        }
        if ($options{end} <= $options{start}) {
            die "the end value must be bigger than the start\n";
        }
        $options{partitions} = int( ($options{end} +1 - $options{start}) / $options{interval});
    }
}
else {
    unless (exists $valid_intervals{ $options{interval} } ) {
        die "invalid interval specified: $options{interval}\n";
    }
    #
    # for year,  month, or week the function must be to_days
    #
    unless ($options{list}) {
        $options{function} = 'to_days' unless $options{function};
    }

    #
    # check the start date
    #
    if ( $options{start} =~ /(\d{4})[\-\.\/](\d+)[\-\.\/](\d+)/) {
        ($syear, $smonth, $sday) = ($1, $2, $3);
        $smonth +=0;
        $sday +=0;
        #print "start $syear $smonth $sday\n";
    }
    else {
        die "invalid date $options{start}\n";
    }
    #
    # check the end date
    #
    if ($options{end}) {
        if ( $options{end} =~ /(\d{4})[\-\.\/](\d+)[\-\.\/](\d+)/) {
            ($eyear, $emonth, $eday) = ($1, $2, $3);
            $emonth +=0;
            $eday +=0;
            # print "end $eyear $emonth $eday\n";
        }
        else {
            die "invalid date $options{end}\n";
        }
        if ($options{interval} eq 'year') {
            $options{partitions} = $eyear - $syear +1;
        }
        elsif ($options{interval} eq 'month') {
            my $months =   (12 - $smonth) 
                 + ( ($eyear - $syear -1) * 12 )
                 + $emonth + 1;
            # print $months,$/;
            $options{partitions} = $months;
        }
        elsif ($options{interval} eq 'week') {
            $options{partitions} = weeks_between($syear, $smonth, $sday,
                                                 $eyear, $emonth, $eday);
        }
        elsif ($options{interval} eq 'day') {
            $options{partitions} = days_between($syear, $smonth, $sday,
                                                 $eyear, $emonth, $eday);
        }
    }       
}

#
# there must be a valid number of partitions
#

unless ($options{partitions} && ($options{partitions} =~ /^\d+$/) ) {
    die "number of partitions required. Use EITHER '--partitions' OR '--end'\n";
}

if ($options{partitions} > 1024) {
    die "too many partitions ($options{partitions}). Maximum allowed is 1024\n";
} 
else {
    print "# partitions: $options{partitions}\n";
}

#
# don't accept a function if COLUMS is being used
#
if ( $options{function} && $options{list} ) {
    die "you must specify either list or function\n";
}

if ($options{explain}) {
    show_options();
}

# print Dumper(\%options) ; exit;

# -----------------------------------------
# start building the ALTER TABLE statement
# -----------------------------------------

print "ALTER TABLE $options{table} \n";
if ($options{reorganize} ) {
    print "REORGANIZE PARTITIONS $options{reorganize} INTO \n";
    $options{prefix} = 'pr';
}
else {
    print "PARTITION by range " ;

    if ($options{function}) {
        print "($options{function}(";
    }
    elsif ($options{list}) {
        print "columns(" 
    }
    else {
        print "("
    }

    print "$options{column}";

    if ($options{function}) {
        print ")";
    }

    print ")\n";
}

print "(\n";

make_partitions( $options{interval});

print ");\n";


# -----------------------------------------
# functions
# -----------------------------------------

sub make_partitions {
    my ($interval) = @_;
    my $partitions_done=0;
    my $p_year  = $syear;
    my $p_month = $smonth;
    my $p_day = $sday;
    my $func_start = 'to_days('; #$options{function};
    my $func_end = ")";
    if ($options{list}) {
        $func_start = "";
        $func_end = "";
    }
    for my $part ($options{first_partition} .. 
            $options{first_partition} + $options{partitions} -1 ) {
        if ($partitions_done) {
            print ", ";
        }
        else {
            print "  "
        }
        $partitions_done++;

        if ($interval =~ /^\d+$/) {
            printf "partition %s%03d VALUES LESS THAN (%d)\n", 
                $options{prefix},
                $partitions_done,
                ($options{start} + (($partitions_done - 1) * $interval)) + $interval;
        }
        else {
            printf "partition %s%03d VALUES LESS THAN (%s'%4d-%02d-%02d'%s)\n", 
                $options{prefix},
                $partitions_done,
                $func_start,
                $p_year,
                $p_month,
                $p_day,
                $func_end;
            if ($interval eq 'day') {
                ($p_year,$p_month,$p_day) = next_day($p_year, $p_month, $p_day);
            }
            elsif ($interval eq 'week') {
                ($p_year,$p_month,$p_day) = next_week($p_year, $p_month, $p_day);
            }
            elsif ($interval eq 'month') {
                ($p_year,$p_month) = next_month($p_year, $p_month);
            }
            elsif ($interval eq 'year') {
                ($p_year,$p_month) = next_year($p_year, $p_month);
            }
            else {
                die "unsupported interval\n";
            }
        }
    }
    if ($options{'maxvalue'}) {
            printf ", partition %s%03d VALUES LESS THAN (MAXVALUE)\n", 
                $options{prefix},
                ++$partitions_done;
    }
}

sub next_year {
    my ($y, $m) = @_;
    $y++;
    return ($y, $m);
}

sub next_week {
    my ($y, $m, $d) = @_;
    for my $i (1 .. 7) {
        ($y, $m, $d) = next_day($y, $m, $d);
    }
    return ($y, $m, $d);
}

sub next_day {
    my ($y, $m, $d) = @_;
    $d++;
    $m += 0;
    my $last_day = days_in_month($y, $m);
    if ($d > $last_day) {
        $d = 1;
        $m++;
    }
    if ($m > 12) {
        $m = 1;
        $y++;
    }
    return ($y, $m, $d);
}

sub is_leap_year {
    my ($y) = @_;
    if (($y % 400) == 0) {
        return 1;
    }
    elsif (($y % 100) == 0) {
        return 0;
    }
    elsif (($y % 4) == 0) {
        return 1
    }
    return 0
}

sub days_in_month {
    my ($y, $m) = @_;
    $m = $m +0;
    my %last_day = (
        1 => 31,
        2 => 28,
        3 => 31,
        4 => 30,
        5 => 31,
        6 => 30,
        7 => 31,
        8 => 31,
        9 => 30,
        10=> 31,
        11=> 30,
        12=> 31,
    );
    if (($m ==2) and (is_leap_year($y))) {
        $last_day{2} = 29;
    }
    return $last_day{$m};
}

sub days_to_year_end {
    my ($y, $m, $d) = @_;
    my $days = days_in_month($y,$m) - $d +1 ;
    for my $month ( $m +1 .. 12 ) {
        $days += days_in_month($y, $month);
    }
    return $days;
}

sub months_between {
    my ($syear, $smonth,
        $eyear, $emonth) = @_;
 
    my $months =   (12 - $smonth) 
         + ( ($eyear - $syear -1) * 12 )
         + $emonth + 1;
    return $months;
}
 
sub days_between {
    my ($syear, $smonth, $sday,
        $eyear, $emonth, $eday) = @_;
    # print "start $syear, $smonth, $sday\n end $eyear, $emonth, $eday\n";
    my $days =0;
    if (  sprintf ("%4d%2d%2d", $eyear, $emonth, $eday) 
          lt 
          sprintf("%4d%2d%2d", $syear, $smonth, $sday) ) 
    {
        die "start interval > end interval\n";
    }
    while (    ($syear < $eyear) 
            or ( ($syear == $eyear) and ($smonth < $emonth) ) 
            or ( ($syear == $eyear) and ($smonth == $emonth) and ($sday < $eday) ) 
        ) {
        if ($syear < $eyear) {
            $days += days_to_year_end($syear, $smonth, $sday);
            $syear++;
            $smonth=1;
            $sday=1;
        }
        elsif ($smonth < $emonth) {
            $days += days_in_month($syear, $smonth) - $sday;
            ($syear, $smonth) = next_month($syear, $smonth);
            $sday =1;
        }
        elsif ($sday < $eday) {
            $days += $eday - $sday +1;
            $sday = $eday;
        }
    }
    return $days;
}

sub weeks_between {
    my ($syear, $smonth, $sday,
        $eyear, $emonth, $eday) = @_;
   my $days = days_between ($syear, $smonth, $sday,
            $eyear, $emonth, $eday);
    # print $days, "\n"; exit;
    return int ($days / 7) +1;
}

sub next_month {
    my ($y, $m) = @_;
    if ($m  == 12) {
        $m = 1;
        $y++;
    }
    else {
        $m++;
    }
    return ($y, $m);
}

sub get_help {
    my ($msg) = @_;
    if ($msg) {
        warn "[***] $msg\n\n";
    }

    my $HELP_MSG = q{};
    for my $op ( 
                sort { $parse_options{$a}{so} <=> $parse_options{$b}{so} } 
                grep { $parse_options{$_}{parse}}  keys %parse_options  ) {
        my $param =  $parse_options{$op}{parse};
        my $param_str = q{    };
        my ($short, $long ) = $param =~ / (?: (\w) \| )? (\S+) /x;
        if ($short) {
            $param_str .= q{-} . $short . q{ };
        } 
        $long =~ s/ = s \@? / = name/x;
        $long =~ s/ = i / = number/x;
        $param_str .= q{--} . $long;
        $param_str .= (q{ } x (40 - length($param_str)) );
        my $text_items = $parse_options{$op}{help};
        for my $titem (@{$text_items}) {
            $HELP_MSG .= $param_str . $titem . "\n";
            $param_str = q{ } x 40;
        }
        if (@{$text_items} > 1) {
            $HELP_MSG .= "\n";
        }
        # $HELP_MSG .= "\n";
   }

   print credits(),
          "syntax: $PROGRAM_NAME [options] \n", 
          $HELP_MSG;
    exit( $msg ? 1 : 0 );
}

sub credits {
    my $CREDITS = 
          qq(    The Partition Helper,  version $VERSION\n) 
        . qq(    This program creates a ALTER TABLE statement to add or reorganize\n )
        . qq(    partitions for MySQL 5.1 or later\n)
        . qq(    (C) 2008-2010 Giuseppe Maxia\n);
    return $CREDITS;
}

sub show_options {
    printf "# %-20s %-20s %s\n", 'options', 'default', 'value';
    printf "# %-20s %-20s %s\n", '-' x 20, '-' x 20, '-' x 20;
    for my $op ( sort { $parse_options{$a}{so} <=> $parse_options{$b}{so} }
            keys %parse_options) {
        my $v = $options{$op};
        my $d = $parse_options{$op}{value};
        printf "# %-20s %-20s %s\n", 
                $op, 
                '(' . (defined $d ? $d : '') . ')',  
                defined $v ? $v : '' ;
    }
    print '# ', '-' x 62, "\n";
}

Monday, August 25, 2008

The ephemeral glory of books

A picture says it all. Special offers at a FrOSCon books booth.
For how long is an IT book in its prime?

Tuesday, August 05, 2008

Editing a dump file from the airport

Keith Murphy asks if you have ever edited your dump files before feeding them to the database. And he recommends using sed. This reminds me of a war story, when I was consulting in Milan. The customer needed, among other things, to import a huge text file (created by a third party application) into a MySQL database. I did everything, including a cron job to load the data every day. By the time I left for the airport, the third party application hadn't yet provided its first sample, but since we knew the fields list, it was a simple case of creating a LOAD DATA INFILE command.
I was boarding my plane, when the customer called, saying that the application was giving dates in the wrong format. Instead of 'YYYY-MM-DD', it was using the Italian human readable format 'DD/MM/YYYY'. No Internet was available, and with just a few minutes before being confined to the airplane, I told the customer "I will send you a text message. Just pipe this command before feeding the file to the database."
And I sent this message:

perl -pe 's{(\d\d)/{\d\d}/{\d\d\d\d}}{$3-$2-$1}g'

I took a picture of my phone before sending the message.

When I landed, I got a text message from the customer. Everything worked fine.
So, that was when I edited 2 GB data file from the airport, using a mobile phone.

Tuesday, March 18, 2008

Perl myths dispelled

Tim Bunce, the author of the Perl DBI, the database library that has influenced similar works in most languages, has published a presentation about Perl myths. The essential list of myths goes:

Perl is dead. No, it's alive and kicking (look at the numbers)
Perl is difficult to read. So are Java and C++, when written by bad programmers.
Perl 6 is killing Perl 5. Actually, the opposite is true. While Perl 6 is taking way too long to be implemented, its analysis has advantaged Perl 5, which has grown new features and performance in 5.10.

Highly educational stuff, as it is this other article: Good Perl code is the best form of evangelism.
As for me, Perl is my main tool of the trade, not only with MySQL and other database related work, but for all system tasks. I use it for mostly any scripting task, and on the command line, to replace grep, awk, sed. I also use it instead of shell scripting, whenever the script grows longer than a few lines.

Friday, June 29, 2007

MySQL as universal server

With all the discussion going on about MySQL Proxy, we may forget that there are alternatives.
Today in the MySQL Dev Zone there is an article about a different proxy. It's more than that, actually. It is a Perl Module implementing the MySQL client/server protocol.
With this tool, you can emulate some of MySQL Proxy features, and you can do something more. For example, you can use Federated tables with non-MySQL database servers. The article features an example where a table from SQLite and a table from PostgreSQL are queried as if they were normal MySQL tables.

Wednesday, April 11, 2007

Handling multiple data sets in stored procedures from Perl

MySQL stored procedures can use SELECT statements in their body, returning a data set to the caller. The manual also explains that multiple data sets can be returned from such a stored procedure.
For example, assume you have this stored procedure:

CREATE PROCEDURE p1(x INT, y INT)
    DETERMINISTIC
    BEGIN
        SELECT 
            x ;
        SELECT 
            x AS first_param, 
            y AS second_param;
        SELECT 
            x, 
            y, 
            x + y AS sum_xy, 
            x * y AS prod_xy; 
        SELECT *  FROM t1;
    END

There are four SELECT statements, with different headers, and a variable number of rows returned.
Early versions of DBD::mysql could not handle multiple data sets. Actually, prior to version 3.0004 they could not even handle one dataset from a stored procedure. Starting with version 4.001, single and multiple datasets from SPs are handled gracefully.
The trick is to call $sth->more_results after retrieving each dataset.
See the complete example at Multiple data sets in MySQL stored procedures

For those interested, the MySQL Users Conference and Expo includes a BoF about DBD::mysql on April 24^th at 7:30pm.

The Data Charmer