There was an interesting development. A colleague called and asked me for advice on how to insert 4 billion rows in a table with a simple structure.
Very simple. No primary key, no indexes. It is needed to perform some specific tests.
create table t1 (
id tinyint not null
Actually, not 4 billion, but 2^32 records are needed, i.e. 4,294,967,296.
The classical method used in these cases is doubling the table contents:
My solution was similar to the one from my quiz.
insert into t1 values (1),(1),(1),(1),(1),(1),(1),(1);
insert into t1 select * from t1;
insert into t1 select * from t1; # and so on
This one is faster than the doubling method, but it still requires from 40 to 65 minutes, depending on how fast is your server.
CREATE VIEW `v4` AS
union all select NULL
union all select NULL
union all select NULL;
CREATE VIEW v1024 AS
select null from v4 a, v4 b, v4 c, v4 d, v4 e;
INSERT INTO t1 select 1 from v1024 a, v1024 b, v1024 c, v4 ;
So, the challenge, for which we will give away 1 MySQL T-shirt to the winner, is as following:
- Generate 2^32 records for table t1;
- no limits to the length of code to use;
- you can use stored routines, temporary tables, views, events, whatever makes the insertion fast;
- the method must be portable across operating systems. If it is not portable, a Unix-only method may be accepted if it is exceptionally faster than SQL-only solutions;
- methods relying on external applications cannot be accepted;
- If an programming language is needed, for compatibility with the test suite we can only accept Bash shell or Perl scripts;
- If you devise a fast method to insert data using MySQL Proxy, a Lua script can be accepted.
- what matters is the final insertion speed and ease of use.
- If a method uses an external script, its speed must be more than 20% faster than the fastest method using only SQL.
- The speed will be calculated on my server, using MySQL 5.1.23.
Solutions so far
- Dpin. 20 minutes for a portable solution is very good.
- Jedy. Very nice, but not that fast. 32 minutes.
- Todd. Brilliant solution (10 minutes for 4 billion rows!), but really impractical. We need something that works for any engine. This one is a dirty trick that is fun to use once, but in the long run it won't stand.