The Data Charmer: MySQL Proxy recipes: tokenizing a query

Friday, March 28, 2008

MySQL Proxy recipes: tokenizing a query

Using regular expressions for query handling can become prohibitively complex after a while. If you need to manipulate parts of a query, either you are a regexp guru (I mean it, really, someone who speaks regular expressions more fluently than English) or you find some alternatives.

MySQL Proxy ships equipped with a tokenizer, a method that, given a query, returns its components as an array of tokens. Each token contains three elements:

name, which is a human readable name of the token (e.g. TK_SQL_SELECT)
id, which is the identifier of the token (e.g. 204)
text, which is the content of the token (e.g. "select").

For example, the query SELECT 1 FROM dual will be returned as the following tokens:

1:
   text          select
   token_name    TK_SQL_SELECT'
   token_id      204
2: 
   text          1
   token_name    TK_INTEGER
   token_id      11
3:
   text          from
   token_name    TK_SQL_FROM
   token_id      105
4: 
   text          dual
   token_name    TK_SQL_DUAL
   token_id      87

It's an array of 4 elements, each one containing three items.

Armed with this new knowledge, we can try now to catch the UPDATE queries using a tokenizer.

    local tokens = proxy.tokenize(query)
   if tokens[1]['token_name'] == 'TK_SQL_UPDATE' then
       print ('this is an update of table ' .. tokens[2]['text'])

   end

The tokenizer can do some more things, and there are some performance problems to be handled when using tokens. If you tokenize every query, it may take thrice as long as using regular expressions. With long queries, the difference can skyrocket. Tokenizing a query can cost you 10 times more than using a regular expression. The tutorial mentioned below will deal with this issue as well.

This post is part of a set of recipes that will eventually become a long article.
Want to hear more? Attend MySQL Proxy - The complete tutorial at the MySQL Users Conference 2008.

4 comments:

Unknown said...: Hello Giuseppe,

Talking about regular expressions and mysql queries there is now an alternative, still in alpha stage though :), to mysql proxy that will allow you to use Perl regex (and I think you like these don't you ?).

Based on the work done by Philip Stoev I've built a POE based mysql pseudo "server" that can be used to proxy queries (there is a barebones proxy example).

I am in the process of making it run faster but it's already a cool toy / tool (since it uses POE).

Just to let you know. It's called POE-Component-DBIx-MyServer and it's on CPAN.

Thanks

Eriam; March 31, 2008 at 2:13:00 PM GMT+2
Unknown said...: Hello Giuseppe,

Talking about regular expressions and mysql queries there is now an alternative, still in alpha stage though :), to mysql proxy that will allow you to use Perl regex (and I think you like these don't you ?).

Based on the work done by Philip Stoev I've built a POE based mysql pseudo "server" that can be used to proxy queries (there is a barebones proxy example).

I am in the process of making it run faster but it's already a cool toy / tool (since it uses POE).

Just to let you know. It's called POE-Component-DBIx-MyServer and it's on CPAN.

Thanks

Eriam; March 31, 2008 at 2:14:00 PM GMT+2
אראל סגל said...: It seems that the tokenize function is not supported in version 0.8.2:

2011-11-24 11:24:55: (critical) (read_query) [string "/home/erelsgl/workspace/erelhome/mysql-prox..."]:226: attempt to call field 'tokenize' (a nil value)

It is also not documented in the MySQL official reference anymore.

Do you know about any replacements?; November 24, 2011 at 10:23:00 AM GMT+1
Anonymous said...: Including the tokenizer yourself should do it. In other words, do something like this:

local tokenizer = require("proxy.tokenizer"); December 27, 2011 at 11:28:00 PM GMT+1