The Data Charmer: tricks

Showing posts with label tricks. Show all posts

Thursday, December 12, 2013

Quick and dirty concurrent operations from the shell

Let’s say that you want to measure something in your database, and for that you need several operations to happen in parallel. If you have a capable programming language at your disposal (Perl, Python, Ruby, PHP, or Java would fit the bill) you can code a test that sends several transactions in parallel.

But if all you have is the shell and the mysql client, things can be trickier. Today I needed such a parallel result, and I only had mysql and bash to accomplish the task.

In the shell, it’s easy to run a loop:

for N in $(seq 1 10)
do
    mysql -h host1 -e "insert into sometable values($N)" 
done

But this does run queries sequentially, and each session will open and close before the next one starts. Therefore there is no concurrency at all.
Then I thought that the method for parallel execution in the shell is to run things in the background, and then collect the results. I just needed to find a way of keeping the first session open while the others are being started.

Here’s what I did: I ran a loop with a countdown, using the seq command, and I included a sleep statement in each query, waiting for a decreasing amount of seconds. If I start with 10 seconds, the first query will sleep for 10 seconds, the second one for 9 seconds, and so on. I will run each command in the background, so they will eat up the time independently.

#!/bin/bash
mysql -h host1 test -e 'drop table if exists t1'
mysql -h host1 test -e 'create table t1 (i int not null primary key, ts timestamp)'

for N in $(seq 10 -1 1)
do
    query1="set autocommit=0"
    query2="insert into test.t1 (i) values($N)"
    query3="select sleep($N) into @a; commit"
    mysql -h host1 -e "$query1;$query2;$query3" &
done

wait

mysql -h host1 test -e 'select * from t1'

The effect of this small script is that the commit for these 10 commands come at the same time, as you can see from the resulting table:

+----+---------------------+
| i  | ts                  |
+----+---------------------+
|  1 | 2013-12-12 18:08:00 |
|  2 | 2013-12-12 18:08:00 |
|  3 | 2013-12-12 18:08:00 |
|  4 | 2013-12-12 18:08:00 |
|  5 | 2013-12-12 18:08:00 |
|  6 | 2013-12-12 18:08:00 |
|  7 | 2013-12-12 18:08:00 |
|  8 | 2013-12-12 18:08:00 |
|  9 | 2013-12-12 18:08:00 |
| 10 | 2013-12-12 18:08:00 |
+----+---------------------+

This is a very good result, but what happens if I need to run 500 queries simultaneously, instead of 10? I don’t want to wait 500 seconds (8+ minutes). So I made an improvement:

for N in $(seq 5000 -10 1)
do
    echo $N
    query1="set autocommit=0"
    query2="insert into test.t1 (i) values($N)"
    query3="select sleep(concat('0.', lpad($N,4,'0'))) into @a; commit"
    mysql -h host1 -e "$query1;$query2;$query3" &
done

Now each SLEEP command is called with a fractional argument, starting at “0.5000”, and continuing with “0.4999,” and so on. You can try it. All 500 rows are committed at the same time.

However, the same time is a bit fuzzy. When we use timestamps with second granularity, it’s quite easy to show the same time. But with microseconds it’s a different story. Here’s what happens if I use MySQL 5.6 with timestamp columns using microseconds (TIMESTAMP(3)):

+----+-------------------------+
| i  | ts                      |
+----+-------------------------+
|  1 | 2013-12-12 18:27:24.070 |
|  2 | 2013-12-12 18:27:24.070 |
|  3 | 2013-12-12 18:27:24.069 |
|  4 | 2013-12-12 18:27:24.068 |
|  5 | 2013-12-12 18:27:24.065 |
|  6 | 2013-12-12 18:27:24.066 |
|  7 | 2013-12-12 18:27:24.062 |
|  8 | 2013-12-12 18:27:24.064 |
|  9 | 2013-12-12 18:27:24.064 |
| 10 | 2013-12-12 18:27:24.064 |
+----+-------------------------+

For the purpose of my tests (the actual queries were different) this is not an issue. Your mileage may vary.

Thursday, December 01, 2011

Never say "there is no way"

Reading a recent MySQL book, I saw an example of SHOW CREATE TABLE that comes with backticks (`) around the table and column names, and a comment:

Unfortunately, there is no way to remove this from generated syntax with this command.

(Emphasis mine).
Here's how it goes:

mysql> show create table mytest\G
*************************** 1. row ***************************
       Table: mytest
Create Table: CREATE TABLE `mytest` (
  `id` int(11) NOT NULL,
  `description` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Of course, there is a way!

mysql> pager tr -d '`'
PAGER set to 'tr -d '`''
mysql> show create table mytest\G
*************************** 1. row ***************************
       Table: mytest
Create Table: CREATE TABLE mytest (
  id int(11) NOT NULL,
  description varchar(50) DEFAULT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)

Never say "there is no way!"

Instead of "tr -d '`'", you can use "sed -e 's/`//g'", which does the same thing.

If you are running the query at the command line, you may use the pipe directly:

$ mysql -e 'show create table test.mytest\G' | tr -d '`'
*************************** 1. row ***************************
       Table: mytest
Create Table: CREATE TABLE mytest (
  id int(11) NOT NULL,
  description varchar(50) DEFAULT NULL,
  PRIMARY KEY (id)
) ENGINE=InnoDB DEFAULT CHARSET=latin1

Wednesday, June 22, 2011

Less known facts about MySQL user grants

Reading MySQL security: inconsistencies I remembered a few related experiments that I did several years ago when I was studying for the MySQL certification. The first fact that came to mind is about the clause "WITH GRANT OPTION", which can only be given on the full set of options, not on a single grant. For example

GRANT INSERT,DELETE,UPDATE on world.* to myuser identified by 'mypass';
GRANT SELECT on world.* to myuser identified by 'mypass' WITH GRANT OPTION;
show grants for myuser\G
*************************** 1. row ***************************
Grants for myuser@%: GRANT USAGE ON *.* TO 'myuser'@'%' IDENTIFIED BY PASSWORD '*6C8989366EAF75BB670AD8EA7A7FC1176A95CEF4'
*************************** 2. row ***************************
Grants for myuser@%: GRANT SELECT, INSERT, UPDATE, DELETE ON `world`.* TO 'myuser'@'%' WITH GRANT OPTION

If you are surprised about the "WITH GRANT OPTION" clause applying to all grants instead of only applying to the SELECT, you forgot to consider how the grants are stored. All the grants for a given user (and a user is the combination of a name and a provenience) are stored in a single record in the mysql.user table. The GRANT OPTION is a column in that record. It is either set or not. You can't assign this option for only one attribute in the same record. It's either all the flagged grants or nothing. If you want to assign the "with grant option" on a single column, you must change either the provenience or the name of the user (thus opening another record). But also this addition may not be enough to reach your goal, as you can see in the next section. The other fact that came to mind about the "WITH GRANT OPTION" clause is that, in the examples given, it is ineffective. I dare say illusory. Let's start. As root user, we create this user:

root> grant all on granted.* to grantee identified by 'happyuser' with grant option;

The granted database exists, and now we have an user that can modify it, and, we think, delegate some functions to someone else.

grantee> grant select on granted.* to delegated identified by 'happy';
ERROR 1410 (42000): You are not allowed to create a user with GRANT
grantee> create user delegated;
ERROR 1227 (42000): Access denied; you need (at least one of) the CREATE USER privilege(s) for this operation

Right. I can't create a new user, but only transfer my superpowers to someone else. I will ask root to create the user, and then I will give it another try.

root>  create user delegated;
Query OK, 0 rows affected (0.00 sec)

grantee> grant select on granted.* to delegated identified by 'happy';
ERROR 1044 (42000): Access denied for user 'grantee'@'%' to database 'mysql'

Ouch! Since the grant tables are in the 'mysql' database, I don't have access. I will ask root to give me access to the mysql 'user' and 'db' tables.

root>  grant insert on mysql.user to grantee ;
Query OK, 0 rows affected (0.00 sec)
root>  grant insert on mysql.db to grantee ;
Query OK, 0 rows affected (0.00 sec)

grantee> grant select on granted.* to delegated identified by 'happy';
ERROR 1044 (42000): Access denied for user 'grantee'@'%' to database 'mysql'

Not good. I tried then to get SELECT,INSERT,UPDATE,DELETE for all the grant tables inside 'mysql'. Still, I could not exercise my grant options. Finally, the only solution was to get privilegs on the whole mysql database.

root> grant insert,select,delete,update on mysql.* to grantee;
Query OK, 0 rows affected (0.00 sec)

grantee> grant select on granted.* to delegated identified by 'happy';
Query OK, 0 rows affected (0.00 sec)

At last, I can grant something to someone.
But wait! Now that I can modify the 'mysql' database ...perhaps I could ...

grantee> update user set Select_priv ='Y',
 Insert_priv ='Y', Update_priv ='Y', Delete_priv ='Y',
 Create_priv ='Y', Drop_priv ='Y', Reload_priv ='Y',
 Shutdown_priv ='Y', Process_priv ='Y', File_priv ='Y',
 Grant_priv ='Y', References_priv ='Y', Index_priv ='Y',
 Alter_priv ='Y', Show_db_priv ='Y', Super_priv ='Y',
 Create_tmp_table_priv ='Y', Lock_tables_priv ='Y', Execute_priv ='Y',
 Repl_slave_priv ='Y', Repl_client_priv ='Y', Create_view_priv ='Y',
 Show_view_priv ='Y', Create_routine_priv ='Y', Alter_routine_priv ='Y',
 Create_user_priv ='Y', Event_priv ='Y', Trigger_priv ='Y',
 Create_tablespace_priv ='Y' where user = 'grantee';

This does not enhance my current grants, because I don't have the SUPER privilege (yet), but I can wait until the server restarts or until someone issues a 'flush privileges'. An then I will have full access to the server. Obviously, this situation is not what the DBA had in mind when the user 'grantee' was created.

Update The habit of always seeing the password set as integral part of the GRANT command has made me err on judgment.
As noted in one of the comments, the "grantee" user could have granted privileges to "delegated" without assigning a password. In this case,"grantee" does not need separate grants to the mysql database, which were apparently needed only if you wanted to set the password with the GRANT command.
All the above post is a miscalculation. The additional grants are not needed, provided that you don't include a password clause in your GRANT command.

Tuesday, March 15, 2011

Quick benchmarking trick

I have been doing quite a lot of benchmarking recently.
I needed to find a safe way of measuring the time spend by the database doing a long task, like catching up on a huge backlog of accumulated replication updates. The problem with measuring this event is that I can record when it starts, but I can't easily detect when it finishes. My initial approach was to monitor the database and count the tables rows to see when the task was done, but I ended up affecting the task performance with my additional queries. So I thought of another method.
Since I had control on what was sent from the master to the slave, I used the following:
The initial time is calculated as the minimum creation time of the databases that I know are created during the exercise. Let's say that I had 5 databases named from db1 to db5:

set @START = (select min(create_time) from information_schema.tables where table_schema like "db%")

Then, to make sure that I catch the exact moment that the task is finished, I added to the master a command for each database:

create table db1.last_table (i int);
create table db2.last_table (i int);
create table db3.last_table (i int);
create table db4.last_table (i int);
create table db5.last_table (i int);

To know if the task is done, I query the database as follows:

select count(*) from information_schema.tables where table_schema like "db%" and table_name="last_table";

If the count is less than 5 (the number of databases that were in my binary logs), I wait more.
Finally, when the count matches the expected one, I get the end time:

set @END = (select max(create_time) from information_schema.tables where table_schema like "db%" and table_name="last_table"');

Now I have two values, @START, and @END

select timediff(@END,@START) as elapsed;
+----------+
| elapsed  |
+----------+
| 00:09:44 |
+----------+

It does not matter if I query the database immediately, or hour after coming back from my errands. Using the table creation times makes sure that I get a clean start and finish time.
I put all the above in a script, and I can check the elapsed time without fear of mistakes.

Monday, March 07, 2011

implementing table quotas in MySQL

I have just seen Limiting table disk quota in MySQL by Shlomi Noach, and I could not resist.
You can actually implement a disk quota using an updatable view with the CHECK OPTION.
Instead of giving the user access to the table, you give access to the view (at least for inserting, see the caveat at the end), and you will get a genuine MySQL error when the limit is reached.

drop table if exists logs;
create table logs (t mediumtext) engine=innodb;

drop function if exists exceeded_logs_quota ;
create function exceeded_logs_quota() 
returns boolean
deterministic
return (
    select CASE 
           WHEN (DATA_LENGTH + INDEX_LENGTH) > (25*1024) 
           THEN TRUE ELSE FALSE 
           END
    FROM 
        information_schema.tables 
    WHERE 
        table_schema=schema() 
        and table_name='logs'
    );

create or replace view logsview as 
    SELECT * FROM logs 
    WHERE NOT exceeded_logs_quota()
    WITH CHECK OPTION;

Here's a test run:

mysql [localhost] {msandbox} (test) > insert into logsview values ('a');
Query OK, 1 row affected (0.00 sec)

mysql [localhost] {msandbox} (test) > select exceeded_logs_quota();
+-----------------------+
| exceeded_logs_quota() |
+-----------------------+
|                     0 |
+-----------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > insert into logsview values (repeat('a', (25 * 1024) - 1));
Query OK, 1 row affected (0.00 sec)

mysql [localhost] {msandbox} (test) > select exceeded_logs_quota();
+-----------------------+
| exceeded_logs_quota() |
+-----------------------+
|                     1 |
+-----------------------+
1 row in set (0.00 sec)

mysql [localhost] {msandbox} (test) > insert into logsview values ('b');
ERROR 1369 (HY000): CHECK OPTION failed 'test.logsview'

You will need to twist the limit to adapt to InnoDB habits of allocating pages rather than bytes, but if you measure the limit in MB the method should work fine.

CAVEAT: You should give your users separate privileges: SELECT on logs, and INSERT on logsview. The view will only return records while exceeded_logs_quota() returns false.

mysql [localhost] {msandbox} (test) > select exceeded_logs_quota();
+-----------------------+
| exceeded_logs_quota() |
+-----------------------+
|                     1 |
+-----------------------+
1 row in set (0.00 sec)
mysql [localhost] {msandbox} (test) > select count(*) from logsview;
+----------+
| count(*) |
+----------+
|        0 |
+----------+
1 row in set (0.01 sec)

mysql [localhost] {msandbox} (test) > select count(*) from logs;
+----------+
| count(*) |
+----------+
|        2 |
+----------+
1 row in set (0.01 sec)

Saturday, March 05, 2011

A hidden options file trick

I was listening today to the OurSQL Episode 36: It's Not Our (De)fault! Part 1. As usual, Sheeri and Sarah are very informational and entertaining while explaining the innards of MySQL and their best practices.
Being a DBA oriented show, there was an omission in this podcast. There was no mention of custom groups that you can have for your my.cnf. This is mostly useful for developers. If your application requires some specific settings, instead of using a separated configuration file, you can use a different group, and then instruct your client applications to use that group.
By default, all client applications read the "[client]" group.
But you can tell your client to read a group that you can call whatever you like.
For example, with this configuration file,

[client]
user=common_user
password=common_password

[logrotation]
user=log_rotation_daemon
password=specific_password

You can have a Perl script that takes care of your particular log rotation needs. Instead of the normal credentials, it will use the ones listed in the [logrotation] group.

use strict;
use warnings;
use DBI;

my $dsn =   "DBI:mysql:test;"
            . "mysql_read_default_group=logrotation;"
            . "mysql_read_default_file=$ENV{HOME}/./my.cnf";
my $dbh = DBI->connect($dsn);

Notice that, for this option to work, the [logrotation] group must come after the [client] group, or the directives in the [client] group will override the ones in [logrotation]. That's why, in the options file, you find the directives for [mysqldump] at the bottom of the file.

So far, so good. This was a trick for developers, and probably many developers know it already. But there is another, related trick, that can be used by non-developers as well.
If you knew about these customized groups, you may have realized that you can't use them with the mysql standard command line client. Or, to say it better, there is no clearly documented way of doing so. There is, in fact, a cute trick that you can use.
Let's say that, from time to time, you want to use a different prompt, but you don't want to edit your $HOME/.my.cnf to change it. You just want your prompt to be there in the option file, and be able to recall it when the situation calls for it.
The mysql internal help does not tell anything about groups. However, a careful search of the manual gives this cryptic entry:

--defaults-group-suffix=suffix, -g suffix

In addition to the groups named on the command line, read groups that have the given suffix.

When I found it, I stared at this puzzling statement for a while. I could not understand which are the groups that are named in the command line.
Eventually, I figured out why there is a group-suffix and not simply a group. It means that if you add a suffix to a default group name, and you tell mysql to look for this suffix, then you will be able to use the appropriate group on demand.
For example, this options file will not work.

# wrong
[pinocchio]
prompt='I have a long nose  =======> '

[master]
prompt='master [\h] {\u} (\d) > '

[slave]
prompt='slave [\h] {\u} (\d) > '

But this one will work:

[mysqlpinocchio]
prompt='I have a long nose  =======> '

[mysqlmaster]
prompt='master [\h] {\u} (\d) > '

[mysqlslave]
prompt='slave [\h] {\u} (\d) > '

Here is a test run:

$ mysql --defaults-group-suffix=pinocchio
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 22
Server version: 5.1.54 MySQL Community Server (GPL)

Copyright (c) 2000, 2010, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

I have a long nose  =======>

The meaning of the suffix part is that mysql will read the default groups (which are [client], and [mysql]), and it will also read any groups that are named "mysqlSUFFIX" or "clientSUFFIX". I have named the group "mysqlpinocchio" and therefore it has been used. It would have worked the same if I had called it "clientpinocchio".

Monday, May 17, 2010

LOAD DATA: a tricky replication issue

When you are importing large amounts of data from other sources LOAD DATA is a common method of inserting data into a table.
It is one of the old commands implemented in MySQL. As such it is very fast, and it has been optimized for both MyISAM and InnoDB.
All is well when you are loading data into a standalone server. All is almost well when you are using replication. LOAD DATA used to be a problem in old versions of MYSQL, prior to 4.1. With recent versions of MySQL, it is replicated correctly, and sometimes efficiently.
The trouble starts when the data file is big. The definition of big and the amount of trouble that you can get depends on many factors. That's why users may not realize that this problem exists, even with fairly large files, and then being hit by this disaster when the file is only a little larger than the previous ones.
First, let me explain what happens when you replicate LOAD DATA.

The LOAD DATA query runs in the master.
When the query is finished, the master starts pumping data to the binary log.
The slave receives the binary log, and it will create a copy of the data file in the default temporary directory.
The slave executes the LOAD DATA query using the temporary file.
When the slave is done loading the data, the temporary file is deleted
The data from the relay log is deleted

At the end of the exercise, your data is only in the database table, both in the master and in the slaves. However, during the loading, each slave needs THREE TIMES THE STORAGE of the initial data file size, not counting the indexes.
If your data is 10 GB, then you will need 20 GB on the master (10 for the table, 10 for the binary log, and eventually 10 more for the indexes).
On the slave, you will need 30 GB: 10 for the table (+ indexes if requested), 10 for the relay logs, and 10 for the file in the temporary directory. The last part is the tricky one. The temporary directory is whatever is indicated in the TMPDIR system variable. If that directory is in a partition with less than 10 GB free, your replication will break, even if your data directory has terabytes of free space.

The Data Charmer

Thursday, December 12, 2013

Quick and dirty concurrent operations from the shell

Thursday, December 01, 2011

Never say "there is no way"

Wednesday, June 22, 2011

Less known facts about MySQL user grants

Tuesday, March 15, 2011

Quick benchmarking trick

Monday, March 07, 2011

implementing table quotas in MySQL

Saturday, March 05, 2011

A hidden options file trick

Monday, May 17, 2010

LOAD DATA: a tricky replication issue

Pages

Giuseppe Maxia

Like my posts? Buy me a book!

Julian's force

Yes, I am one of those

Sapere, saper fare, fare, far sapere

Blog Archive

Planet MySQL