Tuesday, February 28, 2006

puzzled by date functions

Today I was analyzing the behavior of the query optimizer, and I stumbled into a most curious case.
I created two functions returning the extremes of a date range, and I wanted to see how many times those functions were called when used in a WHERE clause So I added log tracing instructions to both of them. The result was quite surprising.
Let's set the environment first.
create database if not exists test ;
use test ;

delimiter //

drop function if exists today_start //
CREATE FUNCTION today_start() RETURNS datetime
begin
insert into mylog (routine_name) values ('today_start');
-- return current_date();
return '2006-02-28 00:00:00';
end //

drop function if exists today_end //
CREATE FUNCTION today_end() RETURNS datetime
begin
insert into mylog (routine_name) values ('today_end');
-- return current_date() + interval 1 day - interval 1 second;
return '2006-02-28 23:59:59';
end //

delimiter ;

drop table if exists t1;
create table t1 (
id int(11) NOT NULL auto_increment,
dt datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

drop table if exists mylog;
create table mylog (
id int not null auto_increment primary key,
routine_name varchar(20) not null,
TS timestamp
);

INSERT INTO `t1` VALUES
(1,'2006-02-28 11:19:35'), (2,'2006-02-28 11:19:38'),
(3,'2006-02-28 11:19:40'), (4,'2006-03-01 11:20:09'),
(5,'2006-03-01 11:20:11'), (6,'2006-03-01 11:20:12'),
(7,'2006-03-01 11:20:13');

select * from t1;
+----+---------------------+
| id | dt |
+----+---------------------+
| 1 | 2006-02-28 11:19:35 |
| 2 | 2006-02-28 11:19:38 |
| 3 | 2006-02-28 11:19:40 |
| 4 | 2006-03-01 11:20:09 |
| 5 | 2006-03-01 11:20:11 |
| 6 | 2006-03-01 11:20:12 |
| 7 | 2006-03-01 11:20:13 |
+----+---------------------+
Now I select all today's rows from t1.
select * from t1 where dt between today_start() and today_end();
+----+---------------------+
| id | dt |
+----+---------------------+
| 1 | 2006-02-28 11:19:35 |
| 2 | 2006-02-28 11:19:38 |
| 3 | 2006-02-28 11:19:40 |
+----+---------------------+
That's correct. Now, let's see how many times the function was called:
select * from mylog;
+----+--------------+---------------------+
| id | routine_name | TS |
+----+--------------+---------------------+
| 1 | today_start | 2006-02-28 12:26:24 |
| 2 | today_end | 2006-02-28 12:26:24 |
+----+--------------+---------------------+
And that too was what I expected. But the story changes if I use a slightly different table. This one has the same columns as t1, but the primary key is the datetime column.
drop table if exists t2;
create table t2 (
id int not null,
dt datetime NOT NULL,
PRIMARY KEY (dt)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into t2 (id, dt) select id, dt from t1;
Now I did the same experiment with this table:
truncate mylog;
select * from t2 where dt between today_start() and today_end();
+----+---------------------+
| id | dt |
+----+---------------------+
| 1 | 2006-02-28 11:19:35 |
| 2 | 2006-02-28 11:19:38 |
| 3 | 2006-02-28 11:19:40 |
+----+---------------------+
The query finds the same records. Let's see what happens to mylog:
select * from mylog;
+----+--------------+---------------------+
| id | routine_name | TS |
+----+--------------+---------------------+
| 1 | today_start | 2006-02-28 12:30:00 |
| 2 | today_end | 2006-02-28 12:30:00 |
| 3 | today_start | 2006-02-28 12:30:00 |
| 4 | today_end | 2006-02-28 12:30:00 |
+----+--------------+---------------------+
I can't imagine why this is happening. The only difference is that dt is now primary key. Instead of being called once, the routine is called twice. If I simply drop the primary key in t2, then the routine is called once per query, as expected.
The result does not change if I use InnoDB tables instead of MyISAM.

Can anyone explain what is happening here?

Friday, February 17, 2006

Speaking at the 2006 MySQL Users Conference

Well, it came a bit late, but it's official (good thing I haven't bought a conference pass yet!)
I'll be speaking at the fourth MySQL Users conference on Higher Order MySQL with this abstract:

Higher order functions are those that can accept a function as an argument or can produce a function as their output. MySQL recent addition of stored procedures and dynamic querying makes it possible to extend the DBMS features, especially for business intelligence and data warehousing. Functions that can create functions are useful to expand the language, producing complex SQL code quickly and accurately.

Some examples of what can be achieved with these kind of functions: global table CRC (as oposed to record CRC) for remote table comparison, cross tabulation (pivot tables in SQL), data on demand from compressed storage (indexed, as opposed to non-indexed data stored by the archive engine), enhanced metadata queries, where records from the information schema and records from data table create customized administrative tools. The applications are countless. The power of stored procedures, paired with dynamic queries can give developers the ability of doubling the features of MySQL DBMS. Higher order functions and procedures are one step beyond normal stored procedures. They give developers one templates to create standardized functions for similar tasks. While the basics of such discipline are better exploited in application languages (C, C++, Perl come to mind), dynamic SQL is perfectly capable of achieving important goals.

Apart from speaking, I will have a chance to meet in person several people that I know only by mail, and to meet again someone that I keep meeting (gladly) over and over at the conferences.
Yes, I look forward to it!

Monday, February 13, 2006

Workshop on MySQL 5, Milan, March 2006


My company is organizing a workshop on MySQL 5.
It will be held in Milan, on March 24th, and it will cover the following:
  • MySQL 5 main new features (stored procedures, information schema,triggers, storage engines, dynamic SQL);
  • The MySQL 5 general purpose routine library;
  • New features in upcoming 5.1 (plug-in engines, XML support, partitioning, event handling);
  • Replication in MySQL 5 (some multiple master tricks and how to use storage engines as complement to replication);
  • Technical questions parade. Each subscriber can submit one technical question, which will get an answer during the workshop.
The workshop is in Italian. English speakers who wish to attend (or to organize a similar event in English) can contact me at one of my company addresses.