Monday, December 17, 2007

Pop quiz: generate 1 million records

This is a quiz that has a aha! solution. Not so trivial, though. It requires some thinking.

Given this table:
create table t1 (
dt datetime not null,
primary key (dt)
);
Task: insert exactly 1 million records in table t1, with the following constraints:
  • Use a maximum of 5 (five) SQL statements;
  • Use only the MySQL interactive command line client;
  • No shell commands;
  • No loading of scripts;
  • No inserts from existing tables in your system. You must assume that t1 is the only table in your entire database;
  • No MySQL Proxy;
  • No stored routines, triggers or events;
  • Each statement must be not longer than 75 characters;
  • UPDATE. No modification of table t1;
  • No LOAD DATA.
Prize: fame and fortune (i.e. your name quoted in these columns).
I will publish the solution at the end of the week.

To make sure that I am not cheating, here is the MD5 signature of the solution file that I will publish this week.
fc6d32faf19b5ac1064093a6d7969f7c  solution.txt
If you are paranoid and believe that I can create an arbitrary file and make its contents match with the above MD5 signature, you have until Friday to get the solution from that. :)

Update: To keep the challenge interesting, I won't publish the comments with the right solutions for a few days. If you have sent a comment, don't worry if it does not show up immediately. I will publish it soon before the final solution.

Solutions so far

  • Shane. Three lines. Less than 6 seconds. And close to the original solution! (Your third solution is almost the same as the intended one)
  • Jedy. Three lines! and less than 6 seconds to execute! on top.
  • Dipin. Your latest solution (3 lines) tops the list.
  • Kai Voigt. one solution with the crowd, and a wicked one like Roland's, but much faster!
  • Roland Bouman. This is the wickedest solution so far. Not the shortest, but it's the one that is totally different from the intended one (and the other solutions). It will crash most weak servers, though.
  • Carsten. In the same league as Kai and Roland for the wicked solution.
  • Dirk1231. The challenge requires not to use other tables. (Also your second solution does). The third attempt put you finally in the list! Your fourth solution is nice, but not enough to climb to the top.
  • Sergey Zhuravlev. Excellent! Can you do it without creating a table? (your second solution is very nice and imaginative)
  • WSchwach. Your solution qualifies as cheating. Altering the given table to insert duplicate records is not a valid solution. Good shot, nonetheless!
  • Morgan Tocker. Very creative! That's very good.
  • Matthias. Good use of the allotted characters.
  • Ephes. Not bad. But you are not assuming that t1 is the only table in your system. Will you try without creating tables?
  • Hubert. Nice try. One of the elements you mentioned will lead you to the right track. Keep trying.
  • Bill Karwin. I did not specify that the numbers should be contiguous, and you took advantage of it! Well done!
  • Domas. Good solution. I expected more from you.
  • Tobias. Your first solution runs forever. The second one is cheating (using a information_schema table)
  • Erik. Nope. No other tables allowed. Not even information_schema tables.
  • William: Nope. No stored routines allowed. And the instruction to create the routine is longer than 75 characters. Your second solution has a command longer than 75 chars, and it's using other tables, and it does not work either!

Some hints

Keep trying, and consider that my intended solution does the following:
  • Inserts contiguous records;
  • Uses all dates in this century;
  • Does not insert from ANY table, not even t1;
  • Does not use LIMIT;
  • The total execution time is below 7 seconds.
UPDATE: Solution online!

28 comments:

zhur said...

create table t2 (i int primary key auto_increment);
insert into t2 () values (),(),(),(),(),(),(),(),(),(),(),(),(),(),(),();
insert into t2 () select null from t2 a1,t2 a2,t2 a3,t2 a4,t2 a5;
insert into t1 select now()+interval i second from t2 limit 1000000;
drop table t2;

wSchwach said...

Four statements are enough:
ALTER TABLE t1 DROP PRIMARY KEY;
CREATE TABLE a (a int(3));
INSERT INTO TABLE a VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10);
INSERT INTO t1 SELECT now() from a b,a c,a d,a e,a f,a g;

Mstar said...

And what is the prize?

gmax said...

Mstar,
the prize, as mentioned in the post, is "fame and fortune", i.e. just the glory. :)

Morgan said...

drop table if exists t1;
create table t1 (
dt datetime not null,
primary key (dt)
);
SET @a=0;
INSERT INTO t1 VALUES (ADDDATE(NOW(),@a:=@a+1)),(ADDDATE(NOW(),@a:=@a+1));
INSERT INTO t1 SELECT ADDDATE(NOW(),@a:=@a+1) FROM t1 a, t1 b, t1 c, t1;
INSERT INTO t1 SELECT ADDDATE(NOW(),@a:=@a+1) FROM t1 a, t1 b, t1 c, t1;
INSERT INTO t1 SELECT ADDDATE(NOW(),@a:=@a+1) FROM t1 a, t1 LIMIT 895006;

- Morgan

ephes said...

well,

i have a rather boring brute-force solution:

create table a select 1 id union select 2 id union select 3 id;
create table b (id int auto_increment primary key);
insert into b select 0 from a b,a c,a d,a e,a f,a g,a h;
insert into b select 0 from b a, b b limit 997813;
insert into t1 select from_unixtime(id) from b;

regards,
ephes

zhur said...

another solution ;-)

set @n:=1;
insert t1 values (990101),(990102),(990103),(990104),(990105),(990106);
insert t1 select 990109+interval @n:=@n+1 second from t1,t1 a,t1 b,t1 c;
insert t1 select 990109+interval @n:=@n+1 second from t1,t1 a limit 998698;

Hubert said...

I felt so close to a solution for a moment. I had that great idea about a 6-way cartesian join inserting bulks of SLEEP(1)+SYSDATE() into t1 which would only take a week and a half to complete! Too bad it was over the 75-chars limit... ;)

William said...

My version:

DELIMITER $$
CREATE FUNCTION `f`() RETURNS INT BEGIN
REPEAT INSERT INTO t1 SELECT @d:=DATE_ADD(@d,INTERVAL 1 SECOND);
SET @c=@c+1; UNTIL @c>100000 END REPEAT; RETURN 1; END$$
SET @c=1,@d=NOW()$$ SELECT f()$$

Dipin said...

insert into t1 values(from_days(366)),(from_days(367)),(from_days(368));
set @cnt = 369;
insert into t1 select from_days(@cnt:=@cnt+1) from t1,t1 b,t1 c,t1 d,t1 e;
insert into t1 select from_days(@cnt:=@cnt+1) from t1,t1 b,t1 c limit 9754;
insert into t1 select from_days(@cnt:=@cnt+1) from t1,t1 b limit 90000;

Bill Karwin said...

-- Puzzle: insert 1 million unique rows to the following table
drop table if exists t1;
create table if not exists t1 (
dt datetime not null,
primary key (dt)
);

-- Five-line solution with lines =< 72 characters.
insert into t1 values (71201),(71202),(71203),(71204),(71205);
insert into t1 values (71206),(71207),(71208),(71209),(71210);
set @i=to_days('2007-08-22');
insert ignore into t1 select from_days(100+(@i:=@i+1)) from t1 a, t1 b;
insert ignore into t1 select from_days(@i:=@i+1) from t1 a,t1 b,t1 c;

-- Simpler three-line solution if lines can be up to 102 characters
insert into t1 values (71201),(71202),(71203),(71204),(71205),(71206),(71207),(71208),(71209),(71210);
set @i=to_days('2007-11-30');
insert ignore into t1 select from_days(@i:=@i+1) from t1 a,t1 b,t1 c,t1 d,t1 e,t1 f;

-- Fun puzzle! Bill Karwin

Dipin said...

Oops. Had a typo... the last line should have been:

insert into t1 select from_days(@cnt:=@cnt+1) from t1,t1 b limit 990000;

-Dipin

Dirk1231 said...

create table t ( `id` mediumint unsigned );

insert into t values(1),(1),(1),(1),(1),(1),(1),(1),(1),(1);

SET @x = 0;

insert into t1
SELECT FROM_UNIXTIME(@x:=@x+1)
FROM t a,t b,t c,t d,t e,t f;

Dirk1231 said...

# here is the solution with a single table
insert into t1 values(NOW()-10),(NOW()-9),(NOW()-8),(NOW()-7),(NOW()-6);
insert into t1 values(NOW()-5),(NOW()-4),(NOW()-3),(NOW()-2),(NOW()-1);
SET @x = 0;
insert into t1 SELECT FROM_UNIXTIME(@x:=@x+1) FROM t a,t b,t c,t d,t e,t f;
delete from t1 limit 10;

Dirk1231 said...

# sorry about the last submit... i thought i had removed the reference
# to the second table
insert into t1 values(NOW()-10),(NOW()-9),(NOW()-8),(NOW()-7),(NOW()-6);
insert into t1 values(NOW()-5),(NOW()-4),(NOW()-3),(NOW()-2),(NOW()-1);
SET @x = 729669;
insert into t1 SELECT FROM_DAYS(@x:=@x+1) FROM t1,t1 a,t1 b,t1 c,t1 d,t1 e;
delete from t1 limit 10;

William said...

My vertion:

set @c=null,@f=now();INSERT INTO t1 SELECT @f:=DATE_ADD(@f,INTERVAL
1 SECOND) FROM (SELECT c FROM (SELECT @c:=IF(IFNULL(@c,0)=0,
1,@c+1) c FROM mysql.help_relation x CROSS JOIN mysql.help_relation y
CROSS JOIN mysql.help_category z) d WHERE c<=1000000) d1

ryan said...

Anyone else get it with one sql statement yet? and if you really think using information_schema is cheating the rule about no other tables (which it isn't because everyone has those even on "empty" servers), you can do the same exact thing here by creating 100 records via a 10x10 union, cross joining that 3 times, and deleting the original 100.

insert into t1 select date_add('2000-01-01', interval @c0:=@c0+1 second) from (select @c0:=0, @c1:=0, @c2:=0, @c3:=0) dummy cross join (select @c1:=@c1+1 from information_schema.global_variables limit 100) q1 cross join (select @c2:=@c2+1 from information_schema.global_variables limit 100) as q0 cross join (select @c3:=@c3+1 from information_schema.global_variables limit 100) as q3;

Dipin said...

Competitive side got the better of me...

insert t1 values(101),(102),(103),(104),(105),(106),(107),(@c:=now());

insert t1 select @c:=@c+interval 1 second from t1,t1 b,t1 c,t1 d,t1 e,t1 f;

insert t1 select @c:=@c+interval 1 second from t1, t1 b limit 737848;

-Dipin

Dirk1231 said...

# here is my solution in 4 lines instead of 5
SET @x = 366;
INSERT INTO t1 VALUES(NOW()-9),(NOW()-8),(NOW()-7),(NOW()-6);
INSERT INTO t1 SELECT FROM_DAYS(@x:=@x+1) from t1,t1 b,t1 c,t1 d, t1 e;
INSERT INTO t1 SELECT FROM_DAYS(@x:=@x+1) from t1,t1 a limit 998972;

Roland Bouman said...

set max_allowed_packet := 1073741824;
set @s=concat('insert into t1 values(@d:=10000101+@i:=1)');
set @s=concat(@s,repeat(',(date_add(@d,interval @i:=@i+1 day))',999999));
prepare s from @s;
execute s;

Roland Bouman said...

insert into t1 values(@d:=10000104),(@d+@i:=1),(@d-1),(@d-2),(@d-3);
insert into t1 select date_add(@d,interval(@i:=@i+1)day) from t1,t1 a,t1 b;
insert into t1 select date_add(@d,interval(@i:=@i+1)day) from t1,t1 a,t1 b;
delete from t1 limit 1197130;

Tobias said...

I'm not sure if this is ok with the rules.

I'm using t1, but no other existing tables (not sure if it says that not inserting from tables include t1?)

Otherwise this should give you 1.86320859201e+15 rows with three commands (unless it runs out of range for DATE before then - I couldn't finish this on my laptop within a reasonable time, but even if it runs out of range, you could use MyISAM and still have > 1M rows! engine isn't specified in the rules).

INSERT t1 VALUES(101),(102),(103),(104),(105),(106),(107),(108),(@a:=109);

INSERT t1 SELECT CURDATE()+INTERVAL @a:=@a+1 DAY FROM t1,t1 t2,t1 t3,t1 t4;

INSERT t1 SELECT CURDATE()+INTERVAL @a:=@a+1 DAY FROM t1,t1 t2,t1 t3,t1 t4;

it's probably not as fancy as you're looking for, more brute force...

I'll see if I can find a more clever solution later ;)

Domas Mituzas said...

my first attempt:

insert into t1 values (now()),(now()-1),(now()-2),(now()-3),(now()-4);
insert into t1 select from_unixtime(if(@a,@a:=@a+1,@a:=0)) from t1;
insert into t1 select from_unixtime(@a:=@a+1) from t1 a,t1 b, t1 c;
insert into t1 select from_unixtime(@a:=@a+1) from t1 a,t1 b limit 998990;

Matthias said...

Hey gmax,

Another solution for your quiz! quite a hack though, but still hoping for some fame and fortune ;)

Matthias


TRUNCATE t1;
/* the solution */
/* dates with these numbers are valid and are converted to 2000-04-XX */
SET @a=420,@b=421,@c=422,@d=423,@e=424,@f=425,@g=426,@h=427,@i=428,@j=429;
INSERT INTO t1 VALUES(@a),(@b),(@c),(@d),(@e),(@f),(@g),(@h),(@i),(@j);
/* add 10*10*10 rows = 1010 rows in the table after that */
INSERT INTO t1(dt) SELECT FROM_UNIXTIME(@j:=@j+1)FROM t1 JOIN(t1 a,t1 b);
/* remove the first 10, reset the incrementing var to the value before inserting last 1000 (these records need to be replaced) */
DELETE FROM t1 WHERE dt IN(@a,@b,@c,@d,@e,@f,@g,@h,@i,@j:=429);
/* start inserting all over again, insert 1000*1000 new rows, the first 1000 will be replaced */
REPLACE INTO t1(dt) SELECT FROM_UNIXTIME(@j:=@j+1)FROM t1 JOIN (t1 a);
/* end solution */
SELECT COUNT(*) from t1;

jedy said...

insert into t1 values (now()-2),(now()-1),(now()),(now()+(@a:=1));
insert into t1 select adddate(now(),@a:=@a+1) from t1 a,t1 b,t1 c,t1 d,t1;
insert into t1 select adddate(now(),@a:=@a+1) from t1,t1 b limit 998972;

Erik said...

Hi!
Here's an easy solution. It does the job whith just one SQL command, plus a SET command. Yes, the command is more than 76 chars, I know that. You could make it shorter by :

use information_schema

but still it is more than 75 chars.

SET @d = TO_DAYS('2000-01-01');
insert t1 select from_days(@d:=@d+1) from information_schema.GLOBAL_VARIABLES,
information_schema.GLOBAL_VARIABLES b,
information_schema.GLOBAL_VARIABLES c LIMIT 1000000

GLOBAL_VARIABLES is of course not a table, but a view, so using it is not against the rules. Don't know the execution time, on my laptop it takes more than a minute, but I guess that is mostly writing to t1 table.

Carsten said...

use test;
drop table if exists t1;
create table t1 (
dt datetime not null,
primary key (dt)
) engine=myisam;

set @@max_allowed_packet=124000000, @a:=729669, @v="(FROM_DAYS(@a:=@a+1))";
set @s=concat("INSERT INTO t1 VALUES ",repeat(concat(@v,','),499999), @v);
prepare p from @s;
execute p;
execute p;

Tobias said...

create view v as select 1 from information_schema.global_status;
INSERT t1 SELECT CURDATE()+INTERVAL @a:=IFNULL(@a,0)+1 DAY FROM v,v a,v b LIMIT 1000000;


/ flupps

Vote on Planet MySQL