Friday, March 28, 2008

MySQL Proxy recipes: tokenizing a query

Using regular expressions for query handling can become prohibitively complex after a while. If you need to manipulate parts of a query, either you are a regexp guru (I mean it, really, someone who speaks regular expressions more fluently than English) or you find some alternatives.

MySQL Proxy ships equipped with a tokenizer, a method that, given a query, returns its components as an array of tokens. Each token contains three elements:
  • name, which is a human readable name of the token (e.g. TK_SQL_SELECT)
  • id, which is the identifier of the token (e.g. 204)
  • text, which is the content of the token (e.g. "select").
For example, the query SELECT 1 FROM dual will be returned as the following tokens:
1:
text select
token_name TK_SQL_SELECT'
token_id 204
2:
text 1
token_name TK_INTEGER
token_id 11
3:
text from
token_name TK_SQL_FROM
token_id 105
4:
text dual
token_name TK_SQL_DUAL
token_id 87
It's an array of 4 elements, each one containing three items.

Armed with this new knowledge, we can try now to catch the UPDATE queries using a tokenizer.
    local tokens = proxy.tokenize(query)
if tokens[1]['token_name'] == 'TK_SQL_UPDATE' then
print ('this is an update of table ' .. tokens[2]['text'])

end
The tokenizer can do some more things, and there are some performance problems to be handled when using tokens. If you tokenize every query, it may take thrice as long as using regular expressions. With long queries, the difference can skyrocket. Tokenizing a query can cost you 10 times more than using a regular expression. The tutorial mentioned below will deal with this issue as well.

This post is part of a set of recipes that will eventually become a long article.
Want to hear more? Attend MySQL Proxy - The complete tutorial at the MySQL Users Conference 2008.

4 comments:

eriam said...

Hello Giuseppe,

Talking about regular expressions and mysql queries there is now an alternative, still in alpha stage though :), to mysql proxy that will allow you to use Perl regex (and I think you like these don't you ?).

Based on the work done by Philip Stoev I've built a POE based mysql pseudo "server" that can be used to proxy queries (there is a barebones proxy example).

I am in the process of making it run faster but it's already a cool toy / tool (since it uses POE).

Just to let you know. It's called POE-Component-DBIx-MyServer and it's on CPAN.

Thanks

Eriam

eriam said...

Hello Giuseppe,

Talking about regular expressions and mysql queries there is now an alternative, still in alpha stage though :), to mysql proxy that will allow you to use Perl regex (and I think you like these don't you ?).

Based on the work done by Philip Stoev I've built a POE based mysql pseudo "server" that can be used to proxy queries (there is a barebones proxy example).

I am in the process of making it run faster but it's already a cool toy / tool (since it uses POE).

Just to let you know. It's called POE-Component-DBIx-MyServer and it's on CPAN.

Thanks

Eriam

אראל סגל said...

It seems that the tokenize function is not supported in version 0.8.2:

2011-11-24 11:24:55: (critical) (read_query) [string "/home/erelsgl/workspace/erelhome/mysql-prox..."]:226: attempt to call field 'tokenize' (a nil value)

It is also not documented in the MySQL official reference anymore.

Do you know about any replacements?

Anonymous said...

Including the tokenizer yourself should do it. In other words, do something like this:

local tokenizer = require("proxy.tokenizer")

Vote on Planet MySQL