Wednesday, April 29, 2009

Test driving the Spider storage engine - sharding for the masses


MySQL Conference and Expo 2009
At the MySQL Conference 2009 I attended a session about the Spider storage engine, an engine with built-in sharding features.
The talk was notable for the speaker wearing a spiderman costume, and for some language barrier that made the talk less enjoyable than it should be. That's a pity, because the engine is very intriguing, and deserves some exploration.

What is the Spider engine, then? In short, it's an extension to the partitioning engine with the ability of connecting to remote servers. Basically, partitions + federated, except that Federated is explicitly removed during the compilation. Additionally, the spider engine should remove current limitations, such as single thread for replication and single source replication, but due to lack of specific documentation, I will limit my current experiment to the sharding features.
The only documentation available is the slide deck from the presentation, and some very reference parameters that come with the source code. I show here what I found by practical inspection.

building the engine

To compile the engine, you need the source code for MySQL 5.1.31 (as required by the documentation, but it works fine with later versions as well).
Download the source code from the launchpad repository and expand it. You will get a ./spider directory, which you need to move under the ./storage directory in the source. Then you compile, with these instructions:
autoconf
automake
./configure \
--prefix=/usr/local/mysql \
--localstatedir=/usr/local/mysql/data \
--libexecdir=/usr/local/mysql/bin \
--enable-thread-safe-client \
--enable-local-infile --with-pic \
--with-fast-mutexes \
--with-client-ldflags=-static \
--with-mysqld-ldflags=-static \
--with-zlib-dir=bundled \
--with-big-tables --with-ssl \
--with-readline --with-embedded-server \
--with-partition --without-innodb \
--without-ndbcluster \
--without-archive-storage-engine \
--without-blackhole-storage-engine \
--with-csv-storage-engine \
--without-example-storage-engine \
--without-federated-storage-engine \
--with-extra-charsets=complex
make
./scripts/make_binary_distribution

Now we will use the MySQL Sandbox to create a testing environment. Let's start with a simple case: 1 main server and 4 remote ones.
make_sandbox $PWD/mysql-5.1.31-osx10.5-i386.tar.gz

This creates a sandbox under $HOME/sandboxes/msb_5_1_31, which is our main server. Before using it, we need to create some additional tables and load the plugin. (These queries are in the spider instructions, but they are hard to cut and paste. This is much easier for that purpose.)

create table if not exists mysql.spider_xa(
format_id int not null default 0,
gtrid_length int not null default 0,
bqual_length int not null default 0,
data char(128) not null default '',
status char(8) not null default '',
primary key (data, format_id, gtrid_length),
key idx1 (status)
) engine=MyISAM default charset=utf8 collate=utf8_bin;
create table if not exists mysql.spider_xa_member(
format_id int not null default 0,
gtrid_length int not null default 0,
bqual_length int not null default 0,
data char(128) not null default '',
scheme char(64) not null default '',
host char(64) not null default '',
port char(5) not null default '',
socket char(64) not null default '',
username char(64) not null default '',
password char(64) not null default '',
primary key (data, format_id, gtrid_length, host, port, socket)
) engine=MyISAM default charset=utf8 collate=utf8_bin;
create table if not exists mysql.spider_tables(
db_name char(64) not null default '',
table_name char(64) not null default '',
priority bigint not null default 0,
server char(64) default null,
scheme char(64) default null,
host char(64) default null,
port char(5) default null,
socket char(64) default null,
username char(64) default null,
password char(64) default null,
tgt_db_name char(64) default null,
tgt_table_name char(64) default null,
primary key (db_name, table_name),
key idx1 (priority)
) engine=MyISAM default charset=utf8 collate=utf8_bin;

install plugin spider soname 'ha_spider.so';
select engine,support,transactions,xa
from information_schema.engines;
+------------+---------+--------------+------+
| engine | support | transactions | xa |
+------------+---------+--------------+------+
| SPIDER | YES | YES | YES |
| MRG_MYISAM | YES | NO | NO |
| CSV | YES | NO | NO |
| MyISAM | DEFAULT | NO | NO |
| MEMORY | YES | NO | NO |
+------------+---------+--------------+------+

Spider is now enabled

preparing the remote servers


The servers used as remote shards can be conveniently replaced, for this experiment, by sandboxes. Let's create three of them, to illustrate the concept.
make_multiple_sandbox --group_directory=spider_dir \
--sandbox_base_port=6000 --check_base_port 5.1.31

Now we have three sandboxes under $HOME/sandboxes/spider_dir, with ports ranging from 6001 to 6003.
What we need to do is to create, in each server, a table with the same structure as the one that is being sharded in the main server.
$ cd $HOME/sandboxes/spider_dir
$ cat tablea.sql
drop schema if exists myspider;
create schema myspider;
use myspider;

Create table tbl_a(
col_a int,
col_b int,
primary key(col_a)
);

$ ./use_all "source tablea.sql"

The main server


Finally, we have all the components in place, we can create the table for the main server.
drop schema if exists   myspider;
create schema myspider;
use myspider;

Create table tbl_a(
col_a int,
col_b int,
primary key(col_a)
) engine = Spider
Connection ' table "tbl_a", user "msandbox", password "msandbox" '
partition by range( col_a )
(
partition pt1 values less than (1000)
comment 'host "127.0.0.1", port "6001"',
partition pt2 values less than (2000)
comment 'host "127.0.0.1", port "6002"',
partition pt3 values less than (MAXVALUE)
comment 'host "127.0.0.1", port "6003"'
);

Compared to classic partitions, there is some new ground to cover. The "CONNECTION" clause is used to define the table name in the remote server. The schema name is assumed to be the same as the one in the main server.
For each partition, we can add a "COMMENT" clause, with the connection parameters. Since we are using sandboxes in the same host, we connect to 127.0.0.1, and use the port corresponding to each sandbox.
From now on, we can use this table almost transparently.

Using the spider engine



# on the main server
./use myspider

insert into tbl_a values (1,1);
Query OK, 1 row affected (0.00 sec)

insert into tbl_a values (1001,2);
Query OK, 1 row affected (0.01 sec)

insert into tbl_a values (2001,3);
Query OK, 1 row affected (0.00 sec)

select * from tbl_a;
+-------+-------+
| col_a | col_b |
+-------+-------+
| 1 | 1 |
| 1001 | 2 |
| 2001 | 3 |
+-------+-------+
3 rows in set (0.01 sec)

So far, so good. No surprises, at least in the surface.
Now, where is the data? A close inspection to the files in the data directory shows that the data is not in the main server.
The data is stored in the "remote" servers, as we can check easily:

# in the spider_dir path
./use_all "select * from myspider.tbl_a"

# server: 1:
col_a col_b
1 1
# server: 2:
col_a col_b
1001 2
# server: 3:
col_a col_b
2001 3

Now, let's apply some curiosity. What happens in the remote server when I insert a row in the main server? Probably the general log can give me an answer.
# spider_dir
./use_all 'set global general_log=1'

# main server
insert into tbl_a values (2002,3);
Query OK, 1 row affected (0.00 sec)

# spider_dir
$ tail node3/data/mysql_sandbox6003.log
090429 17:27:28 299 Connect msandbox@localhost on
090429 17:27:42 299 Query set session sql_log_off = 1
Ah! No curious people allowed.
Well. This can stop a common user, but not a determined one.
MySQL Proxy to the rescue! There is a Lua script that handles logs.
./node2/proxy_start --proxy-lua-script=$PDW/logs.lua --log-level=warning

Let's change the main table definition:
...
partition pt3 values less than (MAXVALUE)
comment 'host "127.0.0.1", port "4040"'

And now we can see what happens.
# main server
insert into tbl_a values (2004,3);
Query OK, 1 row affected (0.00 sec)

#tail /tmp/mysql/log
2009-04-29 18:03:04 305 -- set session sql_log_off = 1 >{0}
2009-04-29 18:03:04 305 -- set session autocommit = 1 >{0}
2009-04-29 18:03:04 305 -- start transaction >{0}
2009-04-29 18:03:04 305 -- set session transaction isolation level repeatable read >{0}
2009-04-29 18:03:04 305 -- insert into `myspider`.`tbl_a`(`col_a`,`col_b`)values(2004,3) >{0}
2009-04-29 18:03:04 305 -- commit >{0}

Hmmm. I don't like the sight of it. autocommit=1 and then start transaction, set session transaction and commit. At the very least, it's wasting three queries. This needs some explanation from the author, I guess. Let's try some data retrieval.

# main server
select * from tbl_a;
+-------+-------+
| col_a | col_b |
+-------+-------+
| 1 | 1 |
| 1001 | 2 |
| 2003 | 3 |
| 2004 | 3 |
| 1001 | 2 |
| 2003 | 3 |
| 2004 | 3 |
+-------+-------+
7 rows in set (0.01 sec)

$tail /tmp/mysql.log
2009-04-29 18:01:07 303 -- set session sql_log_off = 1 >{0}
2009-04-29 18:01:07 303 -- set session autocommit = 1 >{0}
2009-04-29 18:01:07 303 -- start transaction >{0}
2009-04-29 18:01:07 303 -- set session transaction isolation level repeatable read >{0}
2009-04-29 18:01:07 303 -- show table status from `myspider` like 'tbl_a' >{0}
2009-04-29 18:01:07 303 -- select `col_a`,`col_b` from `myspider`.`tbl_a` limit 0,9223372036854775807 >{0}
2009-04-29 18:01:07 303 -- commit >{0}

Scarier than the previous one. The LIMIT clause spells trouble.
And this latest experiment made me try something more ambitious.
I installed a group of 20 sandboxes and loaded the employees test database (4.2 million records), spreading two partitioned tables across the backend servers.
Performance is better than using a single table, but slower than a normal partitioned table on a single server. And I think I know why.
# main server
select count(*) from salaries where from_date between '1995-01-01' and '1995-12-31';
+----------+
| count(*) |
+----------+
| 201637 |
+----------+
1 row in set (0.76 sec)

$ tail /tmp/mysql.log
2009-04-29 18:09:51 307 -- set session sql_log_off = 1 >{0}
2009-04-29 18:09:51 307 -- set session autocommit = 1 >{0}
2009-04-29 18:09:51 307 -- start transaction >{0}
2009-04-29 18:09:51 307 -- set session transaction isolation level repeatable read >{0}
2009-04-29 18:09:51 307 -- show table status from `employees` like 'salaries' >{0}
2009-04-29 18:09:51 307 -- select `emp_no`,`from_date` from `employees`.`salaries` order by `emp_no`,`from_date` limit 0,9223372036854775807 >{0}

This is definitely a waste. It's a problem that is similar to what is happening when using the Federated engine. But here, we get a "ORDER BY" clause that is unnecessary to say the least.

Bugs


During the tests, I spot at least two serious bugs.
When you drop a partition, the data in the remote server is not removed. If you recreate the partition and insert something, you get a "duplicate key" error.

When you drop a database, the table mysql.spider_tables does not get updated, with the result that you can't recreate the table. with the same name, unless you remove the corresponding entries manually.

That was a lot of information for one session. Please try it and comment. Don't expect me to provide answers to the reasons of the spider engine. I hope the author chimes in and clarifies the muddy matter.

21 comments:

CaptTofu said...

I should really work with these guys and we should come up with a solution. I'm done with the book, and have federatedx and a patch for partitioning.

Now, the big trick would be to deal with two major things: pushdown conditions and how to deal with big result sets

gmax said...

@CaptTofu,
Actually, I was hoping you would say so!
I believe that the author could benefit from your experience in this matter.
Unlike Federated, where a sensible user could avoid the pitfalls, sharding will put the user in contact with way too much data, and performance will suffer greatly if we want to keep transparent usage. Pushdown conditions are high priority.

Kentoku said...

Hi Giuseppe and CaptTofu,

Thank you for test driving Spider!!

I am a original developer of Spider Storage Engine.
I answer your questions anyway.


### point1
> 090429 17:27:42 299 Query set session sql_log_off = 1
> Ah! No curious people allowed.
> Well. This can stop a common user, but not a determined one.
For debugging,
You can use "spider_internal_sql_log_off = 0" option for outputting general logs.


### point2
> Hmmm. I don't like the sight of it. autocommit=1 and then start transaction,
> set session transaction and commit. At the very least, it's wasting three
> queries. This needs some explanation from the author, I guess. Let's try
> some data retrieval.
"set session autocommit = 1" is generated by "spider_sync_autocommit = 1"(default) option.
This option is used for synchronous autocommit status between local database and remote databases.
"set session transaction isolation level repeatable read" is generated by "spider_sync_trx_isolation = 1"(default) option.
This option is used for synchronous transaction isolation level between local database and remote databases.
When you use "spider_conn_recycle_mode = 1 or 2", "set session autocommit = 1" and "set session transaction isolation level repeatable read" are executed only once at each creation of the link.

"start transaction" and "commit" are generated by "spider_semi_trx = 1"(default) option.
This option is for execution transaction during one SQL execution at "autocommit = 1".
Let's explain the reason for which this option is necessary.
For example, when execute following SQL.
"select a.col_a, b.col_a from tbl_a a, tbl_a b where a.col_b = b.col_c and a.col_d = 1"
This SQL is resolved following by MySQL and Spider.
SQL1."select ... from tbl_a where col_d = 1"
SQL2."select ... from tbl_a where col_c = ..."
If another connection is updating tbl_a between SQL1 and SQL2, response is unexpected response.
It is a problem.
In this case, if execute "start transaction" with "repeatable read" before SQL1, we can resolve this problem.
Then this option is necessary.
If you do the usage in which this problem doesn't happen, you can use "spider_semi_trx = 0" option.


### point3
> This is definitely a waste. It's a problem that is similar to what is
> happening when using the Federated engine. But here, we get a "ORDER BY"
> clause that is unnecessary to say the least.
In this case, "ORDER BY" means "using index" and sort cost is 0.
What was this explain plan?
In many cases, this operation is important because MySQL itself using index scan expects to get sorted records by index.


### point4
> When you drop a partition, the data in the remote server is not removed.
> If you recreate the partition and insert something, you get a
> "duplicate key" error.
It's not bug.
Spider doesn't delete remote table data because Spider would not allow other Spider's table link
with same remote table data when you drop a partition and the data in the remote server is removed.
Any MySQL servers in which is with Spider, can link same remote table.

In additional to say, Spider works as a table link storage engine.
"Create table" means "create link".
"Drop table" and "Drop partition" mean "drop link".

If you needed, I can probably create a new feature for dropping and creating.


### point5
> When you drop a database, the table mysql.spider_tables does not get updated,
> with the result that you can't recreate the table. with the same name, unless
> you remove the corresponding entries manually.
This is a bug.
I will fix this bug after returning to Japan.


### point6
> Now, the big trick would be to deal with two major things: pushdown
> conditions and how to deal with big result sets
Spider will be available for "engine-condition-pushdown" in Summer 2009.
You currently can use "spider_split_read" option.
"spider_split_read" option responds following SQL fast.
"select ... from tbl_a limit 100"

For big result sets, you should better to use "spider_low_mem_read" option with "spider_split_read" option.

Thanks.

gmax said...

Kentoku,
thanks for your comments.
About point #3, notice that this ORDER BY is generated in asnwer to a COUNT(*) request. That's why it is a waste. Compared to normal partitions, Spider performance is 60% slower, although it is faster than unpartitioned tables.
Good job so far, nonetheless!
I look forward to the improvements that you have promised for the next versions.

Giuseppe

dledwards said...

"Giuseppe Maxia, the Data Charmer, was road testing too, test-driving the Spider storage engine."

Log Buffer #145

Kentoku said...

Hi Giuseppe,

> About point #3, notice that this ORDER BY is generated in asnwer to a
> COUNT(*) request. That's why it is a waste. Compared to normal partitions,
> Spider performance is 60% slower, although it is faster than unpartitioned
> tables.
You can use "spider_max_order=0" option for suppressing to generate "ORDER BY".

> When you drop a database, the table mysql.spider_tables does not get updated,
> with the result that you can't recreate the table. with the same name,
> unless you remove the corresponding entries manually.
Would you tell me more detail when you got a reproduction procedure?
Seems like to me, I did not get same bug.

Spider is now available for "engine-condition-pushdown"!!

Thanks,
Kentoku

arjenAU said...

> start transaction
> set session transaction isolation level repeatable read

I think that flow amounts to a bug. The SET specifies the isolation level to be used from the next transaction started, not one that's already running.

Kentoku said...

It is not a bug.
At reference manual for "set transaction isolation level".
---------------------------------------------------------
With the GLOBAL keyword, the statement sets the default transaction level globally for all subsequent sessions. Existing sessions are unaffected.

With the SESSION keyword, the statement sets the default transaction level for all subsequent transactions performed within the current session.

Without any SESSION or GLOBAL keyword, the statement sets the isolation level for the next (not started) transaction performed within the current session.
---------------------------------------------------------

It means "set transaction isolation level ..." is "perform transaction level from next transaction", "set session transaction isolation level ..." is "perform transaction level immediately".

Let's test it.

-- client 1
mysql> create table trx_isolation_test(a int primary key)engine = innodb;
Query OK, 0 rows affected (0.05 sec)

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

-- client 2
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> set session transaction isolation level READ COMMITTED;
Query OK, 0 rows affected (0.00 sec)

-- client 1
mysql> insert into trx_isolation_test(a) values(1);
Query OK, 1 row affected (0.00 sec)

-- client 2
mysql> select a from trx_isolation_test;
Empty set (0.00 sec)

-- client 1
mysql> commit;
Query OK, 0 rows affected (0.00 sec)

-- client 2
mysql> select a from trx_isolation_test;
+---+
| a |
+---+
| 1 |
+---+
1 row in set (0.01 sec)

-- client 1
mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into trx_isolation_test(a) values(2);
Query OK, 1 row affected (0.00 sec)

-- client 2
mysql> set session transaction isolation level REPEATABLE READ;
Query OK, 0 rows affected (0.00 sec)

mysql> select a from trx_isolation_test;
+---+
| a |
+---+
| 1 |
+---+
1 row in set (0.01 sec)

-- client 1
mysql> commit;
Query OK, 0 rows affected (0.00 sec)

-- client 2
mysql> select a from trx_isolation_test;
+---+
| a |
+---+
| 1 |
+---+
1 row in set (0.00 sec)

mysql> set session transaction isolation level READ COMMITTED;
Query OK, 0 rows affected (0.00 sec)

mysql> select a from trx_isolation_test;
+---+
| a |
+---+
| 1 |
| 2 |
+---+
2 rows in set (0.01 sec)

This feature is important for Spider storage engine.
I intoroduced
> For example, when execute following SQL.
> "select a.col_a, b.col_a from tbl_a a, tbl_a b where a.col_b = b.col_c and a.col_d = 1"
> This SQL is resolved following by MySQL and Spider.
> SQL1."select ... from tbl_a where col_d = 1"
> SQL2."select ... from tbl_a where col_c = ..."
> If another connection is updating tbl_a between SQL1 and SQL2, response is unexpected response.
> It is a problem.
> In this case, if execute "start transaction" with "repeatable read" before SQL1, we can resolve this problem.

Spider has "spider_semi_trx_isolation" option for changing transaction level during one SQL execution.
You can keep repeatable reading from remote MySQL servers at read committed by using "spider_semi_trx_isolation = 2 or 3" option.

naclosagc said...

I know this is silly, but there is no create database statement, or a use statement, for the spider management tables. What database is supposed to be used for those? Thanks, and sorry to ask here, but I couldn't find it.

naclosagc said...

Never mind, my bad. I didn't see the database qualifer on the table names. Thanks.

Martin said...

Hi,

Thanks for writing the howto.

When I try it on a Fedora 13, I can't see the nodes.

# /root/sandboxes/use_all "show variables like 'port'"
Variable_name Value
port 5144

and for some reason it says port 5.1.44, which is the version of MySQL I am using.

When I do:

ps ax|grep -i mysql
1572 ? S 0:00 /bin/sh /root/5144/mysql-5.1.44/5.1.44/bin/mysqld_safe --defaults-file=/root/sandboxes/spider_main/my.sandbox.cnf
1650 ? Sl 0:31 /root/5144/mysql-5.1.44/5.1.44/libexec/mysqld --defaults-file=/root/sandboxes/spider_main/my.sandbox.cnf --basedir=/root/5144/mysql-5.1.44/5.1.44 --datadir=/root/sandboxes/spider_main/data --user=root --log-error=/root/sandboxes/spider_main/data/msandbox.err --pid-file=/root/sandboxes/spider_main/data/mysql_sandbox5144.pid --socket=/tmp/mysql_sandbox5144.sock --port=5144

In your slides on PDF page 54, you get a list of the 3 nodes.

How do you do that?

Best regards,
Martin

datacharmer said...

@Martin,
It seems that either you have not installed the sandbox correctly or you are running the command in the wrong directory.
The command to install the backend is

"make_multiple_sandbox --group_directory=spider_dir \
--sandbox_base_port=6000 --check_base_port 5.1.44 "
and then
$ cd $HOME/sandboxes/spider_dir
$ ./use_all ...

Moreover, I see that you are using the sandbox as root. This is not recommended, or tested.

Martin said...

@datacharmer

Thanks for the quick reply.

Ok, so now I will do it over as the mysql user. I guess the first question I then have, is this expected behavior?

config.status: executing libtool commands
/bin/rm: cannot remove `libtoolT': No such file or directory
config.status: executing default commands
configure: WARNING: unrecognized options: --with-partition, --with-innodb, --without-ndbcluster, --without-archive-storage-engine, --without-blackhole-storage-engine, --with-csv-storage-engine, --without-example-storage-engine, --without-federated-storage-engine

I have libtool installed

rpm -qa|grep -i libtool
libtool-ltdl-2.2.6-20.fc13.x86_64
libtool-ltdl-devel-2.2.6-20.fc13.x86_64
libtool-2.2.6-20.fc13.x86_64

Using mysql-5.1.44.

datacharmer said...

@Martin,
Why are you using the "mysql" user? This user should not have a login.
The sandbox should be used as a normal user.
If the regular user variable $HOME points to a system directory, you may end up in trouble.
Please, look at the docs. http://mysqlsandbox.net

Martin said...

@datacharmer

I have now read the documentation for make_sandbox, and hopefully have a better understanding now =)

Starting over again as a regular user. After the make_multiple_sandbox step I see:

$ cd /home/maj/sandboxes/spider_dir
$ ls
node1
$ /home/maj/sandboxes/use_all "show variables like 'port'"
Variable_name Value
port 5144
$ /home/maj/sandboxes/status_all
spider_main on

Have you an idea what is wrong? I would have expected to see the 3 nodes.

datacharmer said...

@Martin,
If you don't see three nodes, you should have received an error message.
Try repeating the make_multiple_sandbox command, after setting the user variable SBDEBUG=1

$ SBDEBUG=1 make_multiple_sandbox ...

Martin said...

@datacharmer

I didn't get errors, but with your debug trick I do see the errors. Output at:

http://pastebin.com/iC5h4TmH

So it seams that the basedir is set wrong.

Strangely enough it expects mysql_install_db to be in "bin/", where mysql_install_db that I compiled is in "scripts/". See the last 3 lines from the output.

What would you recommend I do here?

datacharmer said...

@Martin,
You seem to be complicating your life unnecessarily.
The program is telling you that mysql_install_db was not found in $HOME/opt/mysql/5.1.44, while you are searching "$HOME/spider/mysql-5.1.44/5.1.44/"
If your binaries are in another directory, either use a symlink to get them to $HOME/opt/mysql/5.1.44 or define
$SANDBOX_BINARY to $HOME/spider/mysql-5.1.44/

Martin said...

Great. Now I am getting even closer =)

http://pastebin.com/nv1bzqnW

The "Lost connection to MySQL server during query" I get. Have you seen that before, and knows what I could do here?

datacharmer said...

@Martin,
Congratulations!
Now you need to inspect the contents of the error log about the crash and ask the Spider developers for help. :-)

Martin said...

Thanks =) I have just written to him.

Vote on Planet MySQL