Wednesday, September 10, 2008

MySQL virtual columns

Virtual columns
Wouldn't it be cool to enhance MySQL creation table syntax to do this?
my_date DATE,
my_month VIRTUAL INT AS (DATE_FORMAT(d,'%Y%m'))
And later, when you insert '2008-08-23' in my_date, then automatically my_month will be set to 200808.
Yes, it would be cool, and it has been done. Andrey Zhakov, a so far unknown to me community contributor, has extended the server code to implement virtual columns (WL#411) and functional indexes (WL#1075)
Andrey has published the code on Launchpad and has published detailed documentation in MySQL Forge. (A Virtual Column Preview and the refernce manual).
I still know very little of Andrey, but I am going to find out more soon. For now, I know that his code works as advertised. You can create virtual columns that are calculated at run time, and optionally stored in the table. Compared to views, this approach is much faster, because it calculates only during insert and updates, not during selects. Compared to triggers, virtual columns do the same work of INSERT and UPDATE triggers combined. And not only that.
As you know, I am an inquisitive person, and I don't accept a technical claim blindly. I need to test it with my own data before believing it. And so I did.
I built the code from the bazaar repository, and I modified a table from my employees test database.
Take for example the salaries table, and suppose you want to enhance it to include the difference between from_date and to_date. We're going to do it in three different ways. In the simple way, we add an INTEGER column, and we run an UPDATE statement after inserting 2.4 million records.
mysql> CREATE TABLE salaries (
emp_no INT NOT NULL,
salary INT NOT NULL,
from_date DATE NOT NULL,
to_date DATE NOT NULL,
ddiff int,
PRIMARY KEY (emp_no, from_date)

$ time ../use simple < load_salaries.dump

real 0m30.884s
user 0m3.129s
sys 0m0.367s

$ time ../use simple -e "update salaries set ddiff=datediff(to_date,from_date)"

real 0m17.774s
user 0m0.005s
sys 0m0.008s

Cost of the whole operation (insert + update) was about 48 seconds.
The second way is using a trigger. The same table structure, with an insert trigger.
mysql> CREATE TRIGGER bi_salaries
before insert on salaries
for each row
set new.ddiff=datediff(new.to_date,new.from_date);

$ time ../use with_trigger < load_salaries.dump

real 1m18.370s
user 0m3.134s
sys 0m0.372s

Uh oh! Quite a overhead, I would say. The trigger takes 30 seconds more than combining INSERT and UPDATE.
The third way is with the virtual column.
mysql> CREATE TABLE salaries (
emp_no INT NOT NULL,
salary INT NOT NULL,
from_date DATE NOT NULL,
to_date DATE NOT NULL,
ddiff virtual int as (datediff(to_date, from_date)) stored,
PRIMARY KEY (emp_no, from_date)

$ time ../use with_virtual < load_salaries.dump

real 0m31.807s
user 0m3.137s
sys 0m0.369s

Wow! I was expecting some improvement in performance, but this is really cool!
Almost three times faster than triggers!
We need to prove that the data is correct before rejoicing. So I ran this query for each table
mysql> select count(*), sum(ddiff), avg(ddiff) from salaries;
| count(*) | sum(ddiff) | avg(ddiff) |
| 2844047 | 702296791062 | 246935.7191 |

And the result was the same for all of them. Virtual columns rock!


Jonathan Levin said...

Thats very cool.
Myself and many other people have been asking for those virtual columns for a while now.

I hope that it would make mysql easier to use over learning to use triggers.

izenmania said...

That... is extremely cool.

Mark Daems said...

Any chance this gets added to mysql 6.0? Building from source is nice, but on windows I don't even think about doing that.

Giuseppe Maxia said...

6.0 is feature frozen.
Unless something spectacular changes in our present acceptance rules, it won't go into 6.0, unfortunately.


Cyril Scetbon said...

It would be interesting to compare it with the original version compiled with the same arguments but not patched to see if there are differences too

ryan said...

It's sad and discouraging that an interesting community contribution like this would be withheld from 6.0, given how 5.1 still hasn't been released as GA yet.

Oh well.