Monday, March 31, 2008

Summer of Code adjustment

The Google Summer of Code deadline has been shifted one week forward. This means that students have until April 7 to submit their applications.

Submission status


We have so far received 38 proposals.
Some of them are excellent. Many are good. A few of them could be improved. If you have submitted a proposal without reading my recommendations you have now plenty of time to review your submission and adjust it for maximum quality.

Advice to students: how to get your application approved


Submitting an application is only the first step. If your proposal is weak, your chances of being chosen are limited. Now it's time to check the application and improve its contents, according to the guidelines mentioned above.
Now, pay attention. If you get a message from a mentor asking for additional material, it's a sign that your application is weak, and it may be discarded. Take those messages seriously, and try your best to comply.

Competition is fierce!


Even if you submitted a very good application, there is always a chance that someone else has submitted a better one for the same project, thus competing for the attention of the only mentor available for that task. Don't settle for "very good": sprint for excellence!

Size matters


A good application is neither too short or too long. If you write only an abstract, your proposal will get little attention. And if you write too much, you may wear out the mentors patience.
Your proposal must demonstrate that you have an independent understanding of the problem, and at least a general idea of how to solve it. Being too specific before discussing the project with a mentor is not going to gain many points. If the mentors asks for such details that's OK. If you are unsure, ask a friend to read your application and tell you if it makes sense. Ask your friend to describe the project to you. If you don't recognize the project from your friend's description, you may have a problem.
Good luck!

Saturday, March 29, 2008

Last chances to apply for Google Summer of Code 2008


Google Summer of Code is still open until March 31st.
There are many interesting projects available in the ideas page.
Please read the guidelines and apply before it's too late!
Plenty of opportunity of contributing to your favorite database and working with great developers. Check the ideas page for a list of famous names willing to spend time with enthusiastic students.

Friday, March 28, 2008

MySQL Proxy recipes: tokenizing a query

Using regular expressions for query handling can become prohibitively complex after a while. If you need to manipulate parts of a query, either you are a regexp guru (I mean it, really, someone who speaks regular expressions more fluently than English) or you find some alternatives.

MySQL Proxy ships equipped with a tokenizer, a method that, given a query, returns its components as an array of tokens. Each token contains three elements:
  • name, which is a human readable name of the token (e.g. TK_SQL_SELECT)
  • id, which is the identifier of the token (e.g. 204)
  • text, which is the content of the token (e.g. "select").
For example, the query SELECT 1 FROM dual will be returned as the following tokens:
1:
text select
token_name TK_SQL_SELECT'
token_id 204
2:
text 1
token_name TK_INTEGER
token_id 11
3:
text from
token_name TK_SQL_FROM
token_id 105
4:
text dual
token_name TK_SQL_DUAL
token_id 87
It's an array of 4 elements, each one containing three items.

Armed with this new knowledge, we can try now to catch the UPDATE queries using a tokenizer.
    local tokens = proxy.tokenize(query)
if tokens[1]['token_name'] == 'TK_SQL_UPDATE' then
print ('this is an update of table ' .. tokens[2]['text'])

end
The tokenizer can do some more things, and there are some performance problems to be handled when using tokens. If you tokenize every query, it may take thrice as long as using regular expressions. With long queries, the difference can skyrocket. Tokenizing a query can cost you 10 times more than using a regular expression. The tutorial mentioned below will deal with this issue as well.

This post is part of a set of recipes that will eventually become a long article.
Want to hear more? Attend MySQL Proxy - The complete tutorial at the MySQL Users Conference 2008.

Thursday, March 27, 2008

MySQL Proxy recipes: parsing a query with regular expressions

If you want to filter a query for further usage, you need first to identify the query as the one to process, and then you need to isolate the elements to use for the task.

Both needs can be satisfied using the Lua string.match(string,pattern) function. Since Lua is an object orieneted language, this function can also be used as stringvar:match(pattern).

Let's see an example. We need to do something with with the UPDATE statements to the employees table.
function read_query( packet )
if packet:byte() ~= proxy.COM_QUERY then
return
end
local query = packet:sub(2)
local cmd, table_name = query:match('^(%w+)%s+(%w+)')
if cmd and cmd:lower() == 'update' then
print( 'updating table ' .. table_name)
if table_name == 'employees' then
print ('doing something to employees')
end
end
end
The regular expression used in this example will capture the first two words in the query. ("%w" is any character in a word. "%w+" means many word characters), and then uses the string.lower() function to check if the captured word is the one we are looking for.
Notice the idiom if cmd and cmd:lower().
If the regular expression fails, i.e. if there is no word to catch, the cmd variable will be nil, and the comparison will fail. This is necessary, because if we use if cmd:lower() when cmd is nil, we get an error message.
In this case, we get a word, and we compare it with a fixed string. If the fixed string is 'update', then we proceed, and compare the table name. If all matches, we can do what we wanted to.
For further pattern matching, see this regular expression patterns tutorial.
The above double check (first isolating a word and then comparing its contents) is necessary because Lua does not support the case insensitive pattern matching (some price to pay for its small footprint). There are some tricks to avoid this, and they will be explained during the tutorial mentioned below.

This post is part of a set of recipes that will eventually become a long article.
Want to hear more? Attend MySQL Proxy - The complete tutorial at the MySQL Users Conference 2008.

Wednesday, March 26, 2008

MySQL Proxy recipes: returning an error

Returning an error message is one of the tasks that may become common when working with MySQL Proxy.
Let's say that you want to prevent users from querying your database at a given time. The most sensible answer that you would send to the client that is requesting a query in the forbidden period is an error message. Not only that, the client must receive an error code and SQL state, as if the error were issued by the database server.
With MySQL Proxy, an error set is a legitimate packet to be sent back to the client, and thus you can return a customized error in answer to any query.
function error_result (msg, code,state)
proxy.response = {
type = proxy.MYSQLD_PACKET_ERR,
errmsg = msg,
errcode = code,
sqlstate = state,
}
return proxy.PROXY_SEND_RESULT
end
proxy.response is a variable type structure. When its type is set to proxy.MYSQLD_PACKET_ERR, you can return the three components of a error packet quite easily. For example, to prevent queries between 13:00 and 14:00, you can use this code:
    local t = os.date('*t')
if t.hour == 13 then
return error_result(
'no queries allowed between 13:00 and 14:00', -- error message
1314, -- error code
'!1314' -- SQL state
)
end
and the user will receive the error that looks like any other issued by the server:
show tables;
ERROR 1314 (!1314): no queries allowed between 13:00 and 14:00
This post is part of a set of recipes that will eventually become a long article.
Want to hear more? Attend MySQL Proxy - The complete tutorial at the MySQL Users Conference 2008.

Tuesday, March 25, 2008

How to write a good application for Summer of Code

Do you want to apply for a MySQL project under Google Summer of Code?
Please read these instructions carefully, then.
State the project you want to tackle in the application title.
Using a title like John Doe's info or application for MySQL are unlikely to attract the mentors attention or to make a good first impression.
State your view of the project
Simply repeating what is in the ideas page won't get you any marks. Say what you would like to do in your own words.
Explain why we should choose you
And be persuasive! The best way to persuade us is by showing a clear analysis of the project. A list of famous names that have praised your work will not impress us.
Don't over-represent yourself
If you claim technical expertise that you don't possess just to get the application, you will do wrong to all the parties involved. We will eventually find out. If we find out soon, you simply won't get accepted. If we find out later, we will have to fail you.
Be bold!
Dare suggest something unusual and beautiful. We are listening. Provided that you are able to deliver what you dream, we'll be happy!
Be brief!
Our mentors are volunteering their time for these projects. If you submit a 10 pages application, chances are that it won't be appreciated. Make it simple, but not too simple (Einstein)

MySQL Proxy recipes: returning a dataset

How do you return a dataset using MySQL Proxy?
This being a common task, as many that we will see in these pages, it is worth implementing with a function.
When you return a dataset, there are two main cases:
  • Returning a single row with just one column. For example, when you need to give the user a simple message;
  • Returning one or more rows with several columns. This is much more useful, and covers case like displaying help information, or creating a tabular set of information.
The first case is quite simple:
function simple_dataset (header, message)
proxy.response.type = proxy.MYSQLD_PACKET_OK
proxy.response.resultset = {
fields = {
{type = proxy.MYSQL_TYPE_STRING, name = header}
},
rows = {
{ message}
}
}
return proxy.PROXY_SEND_RESULT
end
This function is, as required, quite simple. The dataset has just one column, and the name of the column is user defined.
Calling this function with return simple_dataset('from user', 'this message') will cause the Proxy to return a dataset to the client, as if it was received from the database server.
The client will receive:
+--------------+
| from user |
+--------------+
| this message |
+--------------+
A multiple column dataset requires some more work. You need to pass an array for the headers, and a two-dimensional array for the values.
function proxy.global.make_dataset (header, dataset)
proxy.response.type = proxy.MYSQLD_PACKET_OK

proxy.response.resultset = {
fields = {},
rows = {}
}
for i,v in pairs (header) do
table.insert(
proxy.response.resultset.fields,
{type = proxy.MYSQL_TYPE_STRING, name = v})
end
for i,v in pairs (dataset) do
table.insert(proxy.response.resultset.rows, v )
end
return proxy.PROXY_SEND_RESULT
end
Using this function is straightforward. For example, to return a simple help for some custom made commands, we can do:
return make_dataset(
{'command', 'description' }, -- the header
{ -- the rows
{'FOO', 'removes the database'},
{'BAR', 'drops all tables'},
{'FOOBAR', 'makes the server explode'},
}
)
The result on the client is
+---------+--------------------------+
| command | description |
+---------+--------------------------+
| FOO | removes the database |
| BAR | drops all tables |
| FOOBAR | makes the server explode |
+---------+--------------------------+
If you keep these two functions at hand, your Proxy scripts will become more manageable.

This post is part of a set of recipes that will eventually become a long article.
Want to hear more? Attend MySQL Proxy - The complete tutorial at the MySQL Users Conference 2008.

Monday, March 24, 2008

MySQL Cartoons for bloggers


Do you want to use MySQL cartoons in your blog posts?
Feel free to borrow the images from this MySQL Forge wiki page.
Rules of the game:
  • You should only use these cartoons to illustrate an appropriate MySQL subject (Summer of code, conference, Forge, Using MySQL with the Lamp stack, and so on);
  • They are only for blogging. If you need these cartoons for other usage, or if you need different sizes, we are flexible. Please contact us at community - AT - mysql - DOT - com.

Summer of Code projects for MySQL Proxy

Google Summer of Code opens for proposals from March 24th to 31st.
There are several projects available in the ideas page. I am available to mentor some projects about MySQL Proxy enhancements. If you like any of them or have any other Proxy related project to suggest, feel free to discuss a proposal in the SoC mailing list.
Some caveats for students willing to try their hand at any Summer Of Code project:
  • Please state your programming skills clearly. Don't inflate your previous experience.
  • Remember that this is a full time job. You won't have time to spend on long vacations, and you won't be able to work at these projects together with another job;
  • A Summer of Code is a full time job. You will be requested to put up 40 hours a week.
  • You will be requested to submit weekly reports;
  • There are dates when you are requested to submit your results. Please consider them carefully before accepting. If you don't make the deadline, you won't get paid.
The above remarks may sound too harsh, but consider that Google is going to pay a non trivial amount of money for your participation, and in exchange for that money you will have to produce what you agree with the mentoring organization.
And keep in mind that your mentors have a full time job of their own. They will be allowed by the mentoring organizations to dedicate a limited amount of time to Summer of Code (even though I know of mentors who have dedicated much of their free time to SoC projects). In this situation, if you work hard, you will be making your time and theirs worth the effort, and everyone wins.
Instead, if you overcommit at the beginning, or don't work hard throughout the summer, it will be a wasted experience for all.

That said, cooperation between companies with experienced developers and students willing to prove themselves has often resulted in excellent work. If you know what awaits you and you are willing to try, you are most welcome!

Which hardware for MySQL?

MySQL has a reputation of being a lightweight database. This definition can be intended either as having limited features or as requiring little hardware.
CSIRAC,1949
In my experience, I have seen MySQL used in quite heavy load situations, and thus I will discard the first meaning of lightweight. As for expensive hardware, my own experience is mixed. Yahoo! and Google have shown the world that using an array of inexpensive boxes in replication is often the most sensible way of scaling. However, there are many users who don't feel comfortable with replication, and therefore buy more iron when their current box has reached its limits. Sometimes, an upgrade could be avoided by proper configuration, but this is beyond the point. Given the choice, what is your favorite hardware for MySQL?
There is a quick poll on the Dev Zone asking this question.
My own prediction is that we'll get an overwhelming majority of x86 1CPU-Dual core, but this is just a wild guess. Do you use anything different? Please, prove me wrong! Are you on my assumed majority slice? Then prove me right!
If you don't care about what I think (and you shouldn't, really!) please cast a vote for your favorite architecture!

Thursday, March 20, 2008

Forge 2.0


It took a bit more than expected, but MySQL Forge 2.0 is out.
Jay announced the stage server a few weeks ago. Since then, there were 29 bug reports, which Jay duly fixed, but the unsung hero of this cat-and-mouse race is Diego Medina, who alone reported 22 bugs!
Thanks, Jay! Thanks Diego! Thanks to all the ones who tested the Forge and gave feedback.
So, folks: delete your forge.mysql.com cookies, and enjoy the new Forge.

Wednesday, March 19, 2008

Summer of Code - Get ready with your proposals

The Summer of Code is back.

Students, if you want to apply for a project with MySQL, be aware of the timeline.

March 17-24:Would-be student participants discuss application ideas with mentoring organizations.
March 24 ~12 noon PDT / 19:00 UTC Student application period opens.
March 31
5:00 PM PDT /
00:00 UTC April 1, 2008
Student application deadline.

What you need to do now, before the submission for application opens, is to discuss the ideas already published or suggest new ones using the mailing list.
This year there is an interesting novelty for MySQL. You can apply for one of the worklog that are open for community participation with the Code Contribution Program.
If you have ideas for some projects that are not in the above lists, you can send a proposal to the mailing list, and discuss it with the mentors.
The application opens in 5 days. Get ready!

Speaking at Sun Tech Day in Naples


MySQL acquisition by Sun is having immediate effects on my work. While so far I have been concentrating on international community, neglecting my home country, mostly because of lack of an office in Italy, I will now have more chances to attend to the community near me.
My first occasion to work under Sun's flag will be at the Sun Tech day in Naples.
The Tech days are events where some experienced Sun technologists show how things work to an interested and skilled audience (an university is a very good place for this). The attendees have a chance to see the action and to have the experts at their disposal to ask questions and discuss the trends.
This event was already scheduled when the acquisition was announced. So this time I will sneak in with a short participation. There will be more such events, where MySQL will have a more prominent role. Stay tuned.

SQL injection and bad programming practice

I live in a town that is the nemesis of e-commerce applications. The name of my town is Quartu Sant'Elena
Notice that the name contains an apostrophe, which for all practical purposes is represented by a single quote.
Single quotes have a bad reputation, because they may be a symptom of SQL injection. Whenever I enter the name of my town in a web form to buy something, I hold my breath, because I dread what comes next.
The smartest applications have a Ajax interface with online completion, and take the name of the town without problems. The less advanced ones show a multiple choice list containing my town name.
The bad ones refuse the name of the town as invalid, and force me to enter an alternate spelling (Quartu S.Elena), which is recognized by most Italians as being equivalent.
The very bad ones, after forcing me to change the name of the town, refuse my credit card as invalid. The reason? The billing address of my credit card does not match with the one I entered in the web form.
The very terrible ones check the credit card billing address some days after the transaction was completed. I bought a domain name from a big registrar company. They accepted the credit card and assigned me the domain, which I started using immediately. Three days later, I got a message saying that my credit card charge was not being accepted. The domain was blocked, so I could not register it with another company, and the company did not solve my problem, despite hours of phone calls. All for a lazy programming practice!
Real SQL injection prevention is achieved by input checking and string escaping, not by blindly rejecting everything that looks like a quote.

Tuesday, March 18, 2008

Creative MySQL hacking and Summer of Code


You can hack the MySQL server in three ways:
  • Changing the server code. This is straightforward. Download the code, and make the changes you need. If you need the change only for a limited period, this is enough. If the change must be persistent, then you should try to make it accepted. The Code Contribution Program may be for you.
  • Creating a plugin or a UDF. Using the plugin interface or the UDF API you can extend the server in many ways. UDFs require that you (and the user) be able to compile the additional code. Plugins require compiling code only from the author.
  • Mixing the existing hooks. This means being creative bu stretching the intended purpose of simple components of the server or external tools. This is how you can hack the server and create new features using stored routines, events, engines like federated, blackhole, archive, and external tools like MySQL Proxy. This is like composing chess puzzles, where you are given the basic rules, and you have to achieve clever results. This sort of jigsaw puzzle hacking is my favorite.

I am considering some of this latest hacking ideas for the incoming Summer of Code.
If you have any ideas on this department and you would like to pursue them, please send a message to the SoC mailing list. I will be glad to discuss them.

Additional reasons to attend the MySQL UC2008


I was not totally sincere.
When I said that I had seven reasons to attend the UC2008, I was just using a cute title. In reality, the main reason to attend the Users Conference is because I will meet a bunch of friends, and probably make new ones.
Yes! Let's not forget that the Users Conference, in addition to the conference rooms, has many corridors where you can start casual conversations with most anyone, and learn things as amazing as the ones that are said from the podiums.
And the social events! There is lunch by the pool, with ample tables, where you can be seated between a system architect and a stellar developer, and spend the whole meal talking about the subtleties of threading or the latest trend in distributed storage.

I can't say that I look forward to any tutorial, since my tutorial day will be filled with my own, but I should mention some more sessions that I look forward to. For example,
Since I was in the reviewing committee, I know all the sessions, and 98% of them have been accepted with my enthusiastic approval (2% of the times my opinion was outvoted, but it happens to all of us), so the only problem is that there is too much choice, and every time slot involves tough decisions about what to attend and what to miss.
More reasons:

Perl myths dispelled

Tim Bunce, the author of the Perl DBI, the database library that has influenced similar works in most languages, has published a presentation about Perl myths. The essential list of myths goes:
  • Perl is dead. No, it's alive and kicking (look at the numbers)
  • Perl is difficult to read. So are Java and C++, when written by bad programmers.
  • Perl 6 is killing Perl 5. Actually, the opposite is true. While Perl 6 is taking way too long to be implemented, its analysis has advantaged Perl 5, which has grown new features and performance in 5.10.
Highly educational stuff, as it is this other article: Good Perl code is the best form of evangelism.
As for me, Perl is my main tool of the trade, not only with MySQL and other database related work, but for all system tasks. I use it for mostly any scripting task, and on the command line, to replace grep, awk, sed. I also use it instead of shell scripting, whenever the script grows longer than a few lines.

Monday, March 17, 2008

Updated Community Development Program and a new blogger



Welcome to Patrik Backman (with his Nordic Brainstorm [*] ) among the bloggers! Patrik is Director of Software Engineering at MySQL, and very attentive to community matters.
In his first blog entry, Patrik presents the updated Community Development Program, created by Georg Richter, the Development Manager of Connectors and Client Connectivity, well known for his active role in the community.

The new program wants to strengthen the relationship between the external community and the developers inside the company. Contributors will choose among a list of features to implement and bugs to fix, and assigned to a developer who will act as mentor.
This way, the contributor can work closely with someone who can guide the coding in the right direction, saving time to both the coder and the reviewer.
If you have a knack for coding, give it a try!

[*]The name was suggested while brainstorming on IRC. Among the other choices was "the loud thought", which you will appreciate if you know Patrik. :)

Using the event scheduler with OS commands

One of the major additions to MySQL 5.1 is the the event scheduler. It is an internal scheduler, which does not need any help from the operating system. As such, it works independently in every platform.
One drawback of this feature, though, is that it can't communicate with the operating system. i.e. the event scheduler can't read system files, can't send e-mail messages, store data into log files. It can only work within the database server. This is convenient for security, but it is quite limiting. Time for hacking!

In getting started with MySQL Proxy I showed an example of how to run shell commands from any MySQL client. Unfortunately, this method can't be used with the events, because events can't send queries to the outside. Or can they? Let's be creative, and combine Federated tables and MySQL Proxy:
drop table if exists t1, t1f;
create server fed
foreign data wrapper mysql
options (
host '127.0.0.1',
port 4040
database 'test',
user 'msandbox',
password 'msandbox'
);
create table t1 (
id int nt null, cmd varchar(250), primary key (id), key (cmd)
);
create table t1f (
id int not null, cmd varchar(250), primary key (id), key (cmd)
) engine=federated connection = 'fed/t1';
Table t1f is Federated, and it is accessed through the Proxy port (4040). Thus, every query directed to this table will be intercepted by MySQL Proxy, and then we can do what we want.
The Lua script associated with this Federated table is the following, which intercepts UPDATE statements directed to table t1, and sends the contents of the cmd column to an echo statement, using a OS call. (It's just a proof of concept)
function read_query(packet)
if packet:byte() ~= proxy.COM_QUERY then return end
local query = packet:sub(2)
if query:match('UPDATE') then
local tokens = proxy.tokenize(query)
if tokens[10]['token_name'] == 'TK_INTEGER'
and tokens[2]['text'] == 't1'
and tokens[10]['text'] == '1' then
print( " --- " .. query )
os.execute('echo "executing ### ' .. tokens[6]['text'] .. ' ###"')
end
end
end
In short, this script checks if we are updating table t1 (not t1f! The Federated engine accesses the base table) with id = 1, and if yes, uses the content of cmd as argument for an OS command.
Now, let's create and event and see what happens:
set global event_scheduler=1;
create event e1
on schedule every 10 second
do
update t1f set cmd=concat('do this ', now())
where id =1;
The result, as observed by the Proxy, is as follows:
--- UPDATE `t1` SET `cmd` = 'do this 2008-03-17 02:27:47' WHERE `id` = 1 LIMIT 1
executing ### do this 2008-03-17 02:27:47 ###
--- UPDATE `t1` SET `cmd` = 'do this 2008-03-17 02:27:57' WHERE `id` = 1 LIMIT 1
executing ### do this 2008-03-17 02:27:57 ###
QED!

Saturday, March 15, 2008

PBXT, or how to really exploit the plugin architecture

MySQL 5.1 introduces the plugin interface, a series of API that allow the load and unload of server components at runtime.
The idea behind this architecture is to make MySQL server really expandable.
Surprisingly, the latest engines being developed in house (Falcon and Maria) are not implemented using the plugin technology. The best example of how to take advantage of this interface comes from the outside. Paul McCullagh's PBXT is a transactional engine that can be loaded at run time.
To use it, you can either download the pre-compiled plugin library, and load it into your server, or get the source code and build the whole server from scratch.
Of course, the most interesting option is the pre-compiled binary. I downloaded the latest 5.1.23 binaries for Mac OS X, and the corresponding plugin library for my system.
I installed the server using MySQL Sandbox, and then ran this command:
show variables like 'plugin_dir';
+---------------+----------------------------------+
| Variable_name | Value |
+---------------+----------------------------------+
| plugin_dir | /Users/gmax/opt/mysql/5.1.23/lib |
+---------------+----------------------------------+
This command tells me in which directory I should copy the library I just downloaded from primebase site.
$ cp ~/Downloads/libpbxt.so /Users/gmax/opt/mysql/5.1.23/lib
Now I can load the component.
INSTALL PLUGIN pbxt SONAME 'libpbxt.so';
Query OK, 0 rows affected (0.03 sec)

select * from information_schema.engines where engine='pbxt'\G
*************************** 1. row ***************************
ENGINE: PBXT
SUPPORT: YES
COMMENT: High performance, multi-versioning transactional engine
TRANSACTIONS: YES
XA: NO
SAVEPOINTS: NO
That's it! The additional engine is up and running! No compilation needed.

A rather unorthodox test follows. With the help of Patrick Crews I created a test employees database with about 4 million records in 6 tables. Not huge, but not trivial either.
I loaded this database using MyISAM, InnoDB and PBXT, and compared the loading times. Not surprisingly, MyISAM and InnoDB are much faster than PBXT.
Engine          Loading time
------------- ------------
MyISAM 1m08.546s
Innodb 1m46.622s
PBXT 3m20.842s
However, the database includes a test suite (If you have been following my blog, you shouldn't be surprised about that), which calculates the number of records and a global CRC for all the tables. This check includes queries like the following:
SET @crc= '';
INSERT INTO tchecksum
SELECT @crc := SHA1(CONCAT_WS('#',@crc,
emp_no,birth_date,first_name,last_name,gender,hire_date))
FROM employees ORDER BY emp_no;
INSERT INTO found_values VALUES ('employees', (SELECT COUNT(*) FROM employees), @crc);
Heavy stuff. I don't know how much relevant these queries are in real world applications, but that is what I had at hand without squeezing my brains, and this is the result of running the whole test that calculates the global CRC of 4 million records:
Engine          Checking time
------------- -------------
MyISAM 0m26.312s
Innodb 0m31.627s
PBXT 0m26.230s
In this particular test (repeated three times with two different operating systems), PBXT is faster than both MyISAM and InnoDB.
The engine is still green, and perhaps my test is silly, but this start is promising!

Thursday, March 13, 2008

Looking for a MySQL Proxy guru


MySQL Proxy is the most exciting addition to the range of MySQL products since 5.0. Using Proxy you can convert your database server into an application server, or you can create new command on the fly, fix bugs, filter queries, add load balancing to a set of servers, and a myriad of wonderful things.
The company itself is planning to do more with MySQL Proxy, and we have come to a point where we have more works in our hands that we can manage with the current manpower. So, we are looking for a Proxy enthusiast to become a QA engineer. The job opening is online. Look it up. It's a challenging job, but I can promise you that it's really exciting! (Working with developers like Jan Kneschke, Kay Roepke, Mark Matthews, Eric Herman, and the rest of the Enterprise team is a rewarding experience)

Reason #1 to attend the MySQL UC 2008

MySQL Conference & Expo 2008
Disclaimer: Forget about my affiliation, this is my personal list of things that I am going to enjoy at the UC.

#1 The lost art of the Self Join



When you work in the same field for several years, you risk to become effective but unimaginative. You may be good at coding queries or designing tables, but sometimes you lose track with your origins, when you were a creative programmer, who used to tweak the intricacies of C++ or Perl to create marvelous useless brilliancies.
If you recognize yourself in this picture, and wish you could have a spark of that enthusiastic force that made you learn new languages and idioms, despair not. Beat Vontobel session will be like a fresh wind that will clean your mind of the dull tasks and reconcile you to the beauty of programming.
If you are a programmer, come to the Users Conference and don't miss this one!

Don't forget this: Every speaker can give discount codes! Do you want one? drop me a note by email. Do you know another speaker? ask him/her for the discount code!

More reasons:

Wednesday, March 12, 2008

Reason # 2 to attend the MySQL UC2008

MySQL Conference & Expo 2008
Disclaimer: Forget about my affiliation, this is my personal list of things that I am going to enjoy at the UC.

#2 Astronomy, Petabytes, and MySQL



Petabytes! Have you heard that before? Perhaps you have, during some theoretical IT class. But have you used anything that could be remotely close to petabytes?
A petabyte is one thousand Terabyte, or one million Gigabyte.
If you struggle with databases with a few dozen Gigabytes, and you have headaches when you think that you have to deal with Terabytes of data, you can't possibly imagine how to store petabytes of data, and retrieving it!
This session is a challenge. At Stanford they are not morons, and if they are planning to use MySQL to store petabytes of data, there must be a good technical story behind it, which needs to be told. And you must not possibly miss it!

More reasons:

Reason #3 to attend the MySQL UC2008

MySQL Conference & Expo 2008
Disclaimer: Forget about my affiliation, this is my personal list of things that I am going to enjoy at the UC.

#3 Testing PHP/MySQL Applications with PHPUnit/DbUnit



Old school technologists don't think kindly of PHP. Its adepts are believed to be sloppy programmers that create brittle applications. It takes programmers like Sebastian Bergman to level the score and to show that a good programmer is shown by best practices, not by the choice of language.
The best thing a good programmer can do while developing an application is testing. Sebastian shows why you should do it, and how. Not only that, it will show you how easy it is. At the end of this session you will wonder how could you have survived for years without unit testing.
PHP programmers, mark your calendars!


More reasons:

Disabling Mac OS X spotlight, or how to make your life complicated

Since my previous laptop is temporarily out of service, I am using a new one with Mac OS X Leopard. Which is cool, and it has great enhancements. However, after a few hours of usage, I noticed a sudden increase of CPU activity (+50%), for no apparent reason.
The resource monitor fingered a process named "mdworker", and Google took me to a page explaining that the real culprit is spotlight, which is indexing the disk contents.
Since I needed all the CPU power of my laptop to compile the latest server, I asked Google again, and I found a page with a method to disable Spotlight permanently. The recommended method involves root access, changing attributes of a long list of files, and a reboot. (!!!)
I could not afford the luxury of a reboot at the moment (and the instructions looked too obscure for my taste), and therefore I used an alternative (and more Mac-ish) way. I opened the spotlight options in System Preferences, found the privacy section, and added the whole Macintosh HD to the list of places not to index.


Ten seconds later, the CPU activity was back to normal.
I have enabled again the spotlight indexing during the night, and all is well.

Tuesday, March 11, 2008

Want a discount to attend the UC? Call a speaker!



Every speaker at the Users conference has a 20% discount code for friends and relatives who want to attend the UC2008.
Would you like to attend the UC, and save 20%? Send me a message! (my_first_name AT mysql DOT com).
If you don't know me, but you know another speaker, ask him or her for the discount code! And then, when you register for the conference, enter the discount code in the registration form.

Notice that I disabled comments to this post, so nobody will be tempted to ask for the code here. If you need the discount code, don't post comments. Send an email!

Reason #4 to attend the MySQL UC2008

MySQL Conference & Expo 2008
Disclaimer: Forget about my affiliation, this is my personal list of things that I am going to enjoy at the UC.

#4 A Tour of External Language Stored Procedures for MySQL



Eric Herman and Antony Curtis are hard workers. You don't see them bragging about this and that. They do long and thorough research, and then they prototype. At that point, they claim victory, and usually with reason.
Their work on stored procedures using external languages is one of the most refreshing pieces of new technology applied to MySQL since the announcement of MySQL 5.0.
If you are dissatisfied with the cumbersome heaviness of the standard stored routines, come to this session. You will be surprised of the interesting things you can do NOW with a MySQL server.

More reasons:

Reason #5 to attend the MySQL UC 2008

MySQL Conference & Expo 2008
Disclaimer: Forget about my affiliation, this is my personal list of things that I am going to enjoy at the UC.

#5 Database Security Using White-Hat Google Hacking



Sheeri always amazes me. When I think that I have seen all it has to be seen in the field of databases, she will always come with some innovative way of looking at old things.
This session is a series of aha! ideas. Some of them can have come to you. You may have heard of some others, but Sheeri does not stop at that. When she hears about a promising practice, she digs in, until she gets all is there to be learned.
She's a captivating speaker. Once you attend one of her sessions (or if you just talk to her in a corridor about anything), you are addicted. Highly recommended!

More reasons:

Monday, March 10, 2008

Reason #6 to attend the MySQL UC 2008

MySQL Conference & Expo 2008
Disclaimer: Forget about my affiliation, this is my personal list of things that I am going to enjoy at the UC.

#6 Optimizing MySQL and InnoDB on Solaris 10 for World's Largest Photo Blogging Community



Farhan Mashraqi knows his job. He must be good at it, because he has on his hands a very busy and heavily loaded server. Fotolog is one of those servers that can take away all your resources very quickly if you don't pay attention. If you want to stay online, you have to make sure that the server is always healthy and efficient.
The above is theory. Farhan tells you the practice behind this simple rule. He will show how to manage such a monster application and still ride the waves.
Don't miss it!

More reasons:

Did you back up your data today?

My laptop disk died on Friday.
Although this is a sad occurrence, I was very pleased to find out that my backup was in good shape, and I was back in working conditions 30 minutes after the accident.
I don't regret buying a 2 TB firewire hard disk.

That made a huge difference. Last time that I had a breakout, restoring 50GB from a USB external disk took about 5 hours. Transferring the same amount of data through a firewire interface takes now less than 25 minutes. I love technology!

Reason #7 to attend the MySQL UC 2008

MySQL Conference & Expo 2008
Disclaimer: Forget about my affiliation, this is my personal list of things that I am going to enjoy at the UC.

#7 MySQL Proxy : the complete tutorial


I shall start with a shameless plug, of course. I am going to enjoy this tutorial for several reasons.

For starters, it's the first tutorial I get to host at the Users Conference, and this is understandably satisfactory in itself.
Then, because it is going to give me the technical room that I was longing for. I have been writing and speaking about MySQL Proxy for 10 months, and on every occasion I felt that I had time or space limitations. There was not enough time to explain all that we can do with Proxy, or not enough time to write a longer article.
Now we can remedy. I and Jan Kneschke will have 6 full hours to explain Proxy beauty and intricacies, ranging from simple wizardry to replication goodies.
We will cover the basics of Lua language, with some internals, so that the attendees will leave the tutorial with the basic know-how to use the Proxy effectively.
There are still available seats. Hurry up and register!

Friday, March 07, 2008

Virtual attendance to Meetup-mashup in Boston - March 10

I love this company!
MySQL is still a virtual company. The Sun acquisition hasn't change this very peculiar fact. Yesterday and the day before we have held IRC meetings between the MySQL community and some Sun open source big shots.
virtual sea
On Monday evening in Boston, there will be a more traditional user group meeting in Boston, host by Sheeri K. Cabral. Actually, not very much traditional. I mean, the real people will physically attend the meetup, but the whole meetup will be recorded, and you can participate via IRC. And, even better, I will make a guest appearance via video-conference.
So, please meet me in Boston on March 10, at 7 PM EST, on a cold evening, while I will talk from Sardinia, 6,400 Km (4,000 miles) apart. Thanks Sheeri for organizing this event!

Thursday, March 06, 2008

Logs of IRC Meetup session with Ian Murdock

As Kaj announced, there was a IRC session with Ian Murdock on March 5.
Since I have been working in a virtual company for 18 months, I should not be surprised about this kind of events. Actually, IRC has become part of my life. I could not conceive working efficiently without it. When some new colleague asks me to have a simple phone call, without any IRC or other char facilities, I start shaking at the idea of such poor and antiquate way of communicating.
Even for one with experience, though, a public IRC meeting in a crowded #room is quite an experience. The "location" for the meeting is room #mysql on irc.freenode.net, a room that normally contains 400~500 people, where the level of noise is quite high.
Melbourne, Feb 2008
Having a public interview in such room is like crashing a party of bikers and start talking philosophy. The beauty of the system is that nobody seem to care. The bikers continue talking about their stuff, and our interview on philosophy went quite smoothly, albeit intermingled with requests of help (promptly answered by a bunch of MySQL geeks) and some trollish shouting that added color to the picture.
For the curious ones, here is the full session logs.

Monday, March 03, 2008

DISTINCT? Don't be lazy!

Sometimes, the easy solution is not the best one. I saw this problem happening several times. The query returns duplicates, and the first reaction of the developer is to fix it with DISTINCT.

Let's look at an example. Given the data below:

select * from people;
+-----------+-------+
| person_id | name |
+-----------+-------+
| 1 | Joe |
| 2 | Mary |
| 3 | Frank |
+-----------+-------+
3 rows in set (0.00 sec)

select * from projects;
+------------+-------------+-----------+
| project_id | descr | person_id |
+------------+-------------+-----------+
| 1 | Joe First | 1 |
| 2 | Joe second | 1 |
| 3 | Mary First | 2 |
| 4 | Mary second | 2 |
| 5 | Frank first | 3 |
+------------+-------------+-----------+
5 rows in set (0.00 sec)

select * from jobs;
+--------+-----------+-----------+------------+
| job_id | job_descr | person_id | project_id |
+--------+-----------+-----------+------------+
| 1 | joe aaa | 1 | 1 |
| 2 | joe bbb | 1 | 1 |
| 3 | joe ccc | 1 | 2 |
| 4 | Mary aaa | 2 | 3 |
| 5 | Mary bbb | 2 | 3 |
| 6 | Mary ccc | 2 | 3 |
| 7 | Mary ddd | 2 | 4 |
| 8 | Frank aaa | 3 | 5 |
| 9 | Frank bbb | 3 | 5 |
+--------+-----------+-----------+------------+
9 rows in set (0.01 sec)

The problem comes with this query:

SELECT p.name, COUNT(j.job_id) AS total , job_descr
FROM people p
INNER JOIN jobs j ON p.person_id = j.person_id
INNER JOIN projects pr ON pr.person_id = j.person_id
GROUP BY p.person_id ORDER BY total DESC,p.name;
+-------+-------+-----------+
| name | total | job_descr |
+-------+-------+-----------+
| Mary | 8 | Mary aaa |
| Joe | 6 | joe aaa |
| Frank | 2 | Frank aaa |
+-------+-------+-----------+

As you can easily see, the query reports twice the amount of jobs for Mary and Joe. The lazy solution is this

SELECT p.name, COUNT(DISTINCT j.job_id) AS total , job_descr
FROM people p
INNER JOIN jobs j ON p.person_id = j.person_id
INNER JOIN projects pr ON pr.person_id = j.person_id
GROUP BY p.person_id ORDER BY total DESC,p.name;
+-------+-------+-----------+
| name | total | job_descr |
+-------+-------+-----------+
| Mary | 4 | Mary aaa |
| Joe | 3 | joe aaa |
| Frank | 2 | Frank aaa |
+-------+-------+-----------+
However, this query does not tackle the real problem, which is that the query is joining two tables (projects and jobs) using a non-primary key column. And this "solution" also ignores the even more serious problem that the person_id is redundant, and should not be in the jobs table in the first place. Joining with a pair of primary/foreign key is the right cure:
SELECT p.name, COUNT(j.job_id) AS total, job_descr
FROM people p
INNER JOIN jobs j ON p.person_id = j.person_id
INNER JOIN projects pr ON pr.project_id = j.project_id
GROUP BY p.person_id ORDER BY total DESC,p.name ;
+-------+-------+-----------+
| name | total | job_descr |
+-------+-------+-----------+
| Mary | 4 | Mary aaa |
| Joe | 3 | joe aaa |
| Frank | 2 | Frank aaa |
+-------+-------+-----------+
The result is the same, but if you apply these queries on a couple of heavily populated tables, the first lazy query can be 5 times slower than the second one. The reason is simple: since the join was done on a non primary key column, the query performs a Cartesian product of projects and jobs, followed by a costly sort to remove the duplicates. The second query, instead, filters off the duplicates efficiently on the first step, thus delivering the wanted result faster.