Tuesday, February 28, 2006

puzzled by date functions

Today I was analyzing the behavior of the query optimizer, and I stumbled into a most curious case.
I created two functions returning the extremes of a date range, and I wanted to see how many times those functions were called when used in a WHERE clause So I added log tracing instructions to both of them. The result was quite surprising.
Let's set the environment first.
create database if not exists test ;
use test ;

delimiter //

drop function if exists today_start //
CREATE FUNCTION today_start() RETURNS datetime
begin
insert into mylog (routine_name) values ('today_start');
-- return current_date();
return '2006-02-28 00:00:00';
end //

drop function if exists today_end //
CREATE FUNCTION today_end() RETURNS datetime
begin
insert into mylog (routine_name) values ('today_end');
-- return current_date() + interval 1 day - interval 1 second;
return '2006-02-28 23:59:59';
end //

delimiter ;

drop table if exists t1;
create table t1 (
id int(11) NOT NULL auto_increment,
dt datetime NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

drop table if exists mylog;
create table mylog (
id int not null auto_increment primary key,
routine_name varchar(20) not null,
TS timestamp
);

INSERT INTO `t1` VALUES
(1,'2006-02-28 11:19:35'), (2,'2006-02-28 11:19:38'),
(3,'2006-02-28 11:19:40'), (4,'2006-03-01 11:20:09'),
(5,'2006-03-01 11:20:11'), (6,'2006-03-01 11:20:12'),
(7,'2006-03-01 11:20:13');

select * from t1;
+----+---------------------+
| id | dt |
+----+---------------------+
| 1 | 2006-02-28 11:19:35 |
| 2 | 2006-02-28 11:19:38 |
| 3 | 2006-02-28 11:19:40 |
| 4 | 2006-03-01 11:20:09 |
| 5 | 2006-03-01 11:20:11 |
| 6 | 2006-03-01 11:20:12 |
| 7 | 2006-03-01 11:20:13 |
+----+---------------------+
Now I select all today's rows from t1.
select * from t1 where dt between today_start() and today_end();
+----+---------------------+
| id | dt |
+----+---------------------+
| 1 | 2006-02-28 11:19:35 |
| 2 | 2006-02-28 11:19:38 |
| 3 | 2006-02-28 11:19:40 |
+----+---------------------+
That's correct. Now, let's see how many times the function was called:
select * from mylog;
+----+--------------+---------------------+
| id | routine_name | TS |
+----+--------------+---------------------+
| 1 | today_start | 2006-02-28 12:26:24 |
| 2 | today_end | 2006-02-28 12:26:24 |
+----+--------------+---------------------+
And that too was what I expected. But the story changes if I use a slightly different table. This one has the same columns as t1, but the primary key is the datetime column.
drop table if exists t2;
create table t2 (
id int not null,
dt datetime NOT NULL,
PRIMARY KEY (dt)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;

insert into t2 (id, dt) select id, dt from t1;
Now I did the same experiment with this table:
truncate mylog;
select * from t2 where dt between today_start() and today_end();
+----+---------------------+
| id | dt |
+----+---------------------+
| 1 | 2006-02-28 11:19:35 |
| 2 | 2006-02-28 11:19:38 |
| 3 | 2006-02-28 11:19:40 |
+----+---------------------+
The query finds the same records. Let's see what happens to mylog:
select * from mylog;
+----+--------------+---------------------+
| id | routine_name | TS |
+----+--------------+---------------------+
| 1 | today_start | 2006-02-28 12:30:00 |
| 2 | today_end | 2006-02-28 12:30:00 |
| 3 | today_start | 2006-02-28 12:30:00 |
| 4 | today_end | 2006-02-28 12:30:00 |
+----+--------------+---------------------+
I can't imagine why this is happening. The only difference is that dt is now primary key. Instead of being called once, the routine is called twice. If I simply drop the primary key in t2, then the routine is called once per query, as expected.
The result does not change if I use InnoDB tables instead of MyISAM.

Can anyone explain what is happening here?

No comments: