The Data Charmer: Quick benchmarking trick

Tuesday, March 15, 2011

Quick benchmarking trick

I have been doing quite a lot of benchmarking recently.
I needed to find a safe way of measuring the time spend by the database doing a long task, like catching up on a huge backlog of accumulated replication updates. The problem with measuring this event is that I can record when it starts, but I can't easily detect when it finishes. My initial approach was to monitor the database and count the tables rows to see when the task was done, but I ended up affecting the task performance with my additional queries. So I thought of another method.
Since I had control on what was sent from the master to the slave, I used the following:
The initial time is calculated as the minimum creation time of the databases that I know are created during the exercise. Let's say that I had 5 databases named from db1 to db5:

set @START = (select min(create_time) from information_schema.tables where table_schema like "db%")

Then, to make sure that I catch the exact moment that the task is finished, I added to the master a command for each database:

create table db1.last_table (i int);
create table db2.last_table (i int);
create table db3.last_table (i int);
create table db4.last_table (i int);
create table db5.last_table (i int);

To know if the task is done, I query the database as follows:

select count(*) from information_schema.tables where table_schema like "db%" and table_name="last_table";

If the count is less than 5 (the number of databases that were in my binary logs), I wait more.
Finally, when the count matches the expected one, I get the end time:

set @END = (select max(create_time) from information_schema.tables where table_schema like "db%" and table_name="last_table"');

Now I have two values, @START, and @END

select timediff(@END,@START) as elapsed;
+----------+
| elapsed  |
+----------+
| 00:09:44 |
+----------+

It does not matter if I query the database immediately, or hour after coming back from my errands. Using the table creation times makes sure that I get a clean start and finish time.
I put all the above in a script, and I can check the elapsed time without fear of mistakes.

1 comment:

Ilan said...: Hi,
If I understand:
I should add a create table command in the master at the beginning of the process and another one at the end of the process.
Then all I should do, when the replication is done, is to run the following queries:
set @START = (select min(create_time) from information_schema.tables where table_schema like "db%");

set @END = (select max(create_time) from information_schema.tables where table_schema like "db%" and table_name="last_table"');

select timediff(@END,@START) as elapsed;

Am I right?
Thanks; March 15, 2011 at 8:41:00 AM GMT+1