There was an interesting development. A colleague called and asked me for advice on how to insert 4 billion rows in a table with a simple structure.
create table t1 (
id tinyint not null
);
Very simple. No primary key, no indexes. It is needed to perform some specific tests.Actually, not 4 billion, but 2^32 records are needed, i.e. 4,294,967,296.
The classical method used in these cases is doubling the table contents:
insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1);
insert into t1 select * from t1;
insert into t1 select * from t1; # and so on
My solution was similar to the one from my quiz.CREATE VIEW `v4` AS
select NULL
union all select NULL
union all select NULL
union all select NULL;
CREATE VIEW v1024 AS
select null from v4 a, v4 b, v4 c, v4 d, v4 e;
INSERT INTO t1 select 1 from v1024 a, v1024 b, v1024 c, v4 ;
This one is faster than the doubling method, but it still requires from 40 to 65 minutes, depending on how fast is your server.So, the challenge, for which we will give away 1 MySQL T-shirt to the winner, is as following:
- Generate 2^32 records for table t1;
- no limits to the length of code to use;
- you can use stored routines, temporary tables, views, events, whatever makes the insertion fast;
- the method must be portable across operating systems. If it is not portable, a Unix-only method may be accepted if it is exceptionally faster than SQL-only solutions;
- methods relying on external applications cannot be accepted;
- If an programming language is needed, for compatibility with the test suite we can only accept Bash shell or Perl scripts;
- If you devise a fast method to insert data using MySQL Proxy, a Lua script can be accepted.
- what matters is the final insertion speed and ease of use.
- If a method uses an external script, its speed must be more than 20% faster than the fastest method using only SQL.
- The speed will be calculated on my server, using MySQL 5.1.23.
Solutions so far
- Dpin. 20 minutes for a portable solution is very good.
- Jedy. Very nice, but not that fast. 32 minutes.
- Todd. Brilliant solution (10 minutes for 4 billion rows!), but really impractical. We need something that works for any engine. This one is a dirty trick that is fun to use once, but in the long run it won't stand.