tag:blogger.com,1999:blog-16959946.post3845963731254711700..comments2023-12-09T16:44:47.897+01:00Comments on The Data Charmer: Pop quiz (with prize): generate 4 billion recordsGiuseppe Maxiahttp://www.blogger.com/profile/15801583338057324813noreply@blogger.comBlogger10125tag:blogger.com,1999:blog-16959946.post-84237616946963269262007-12-24T10:26:00.000+01:002007-12-24T10:26:00.000+01:00dirk1231,Don't bother with the dd solution.It is o...dirk1231,<BR/>Don't bother with the dd solution.<BR/>It is only valid for MyISAM tables, and thus not very valuable.<BR/>Besides, as for Todd's solution, there is risk of data corruption.<BR/>So, I must rule out this one.Giuseppe Maxiahttps://www.blogger.com/profile/15801583338057324813noreply@blogger.comtag:blogger.com,1999:blog-16959946.post-3898892129411958592007-12-24T01:10:00.000+01:002007-12-24T01:10:00.000+01:00I think the fastest way is going to be to create t...I think the fastest way is going to be to create the MYD file using the dd command to repeat a hex pattern. Then take the existing MYI file and modify the needed pieces in a hex editor. I have part of the solution for this, but not enough time to complete it this week.Dirk1231https://www.blogger.com/profile/01604966231936807391noreply@blogger.comtag:blogger.com,1999:blog-16959946.post-54648460904894656052007-12-21T19:40:00.000+01:002007-12-21T19:40:00.000+01:00Your sample solution takes about 1 hour and 18 min...Your sample solution takes about 1 hour and 18 minutes on my machine.<BR/><BR/>The following takes about 22 minutes on the same machine.<BR/><BR/>-Dipin<BR/><BR/>create table t2 (<BR/>id tinyint not null<BR/>) engine=memory;<BR/><BR/>insert into t2 values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1);<BR/><BR/>insert into t2 select 1 from t2 a, t2 b, t2 c, t2 d limit 65520;<BR/><BR/>insert into t1 select 1 from t2 a, t2 b;<BR/><BR/>drop table t2;Anonymoushttps://www.blogger.com/profile/17903278771915130754noreply@blogger.comtag:blogger.com,1999:blog-16959946.post-25422364753335413322007-12-21T08:32:00.000+01:002007-12-21T08:32:00.000+01:00It need to create temporary table to insert from v...It need to create temporary table to insert from view. We can create a table to avoid this. It may save about half of the time.<BR/><BR/>create table t2 (<BR/>id tinyint not null<BR/>) engine=memory;<BR/>insert into t2 select 1 from v1024 a, v1024 b;<BR/>delimiter //<BR/>create procedure insert_t1 (n int)<BR/>begin declare i int default 0;<BR/>prepare sth from "insert into t1 select * from t2";<BR/>while i < n do<BR/> execute sth;<BR/> set i = i + 1;<BR/>end while;<BR/>end//<BR/>delimiter ;<BR/>call insert_t1(1024);<BR/><BR/>On my server, this took about 6.5 minutes to insert 1 billion records. So maybe half an hour for 4 billion.jedyhttps://www.blogger.com/profile/04289869955957083884noreply@blogger.comtag:blogger.com,1999:blog-16959946.post-12017573827185955012007-12-21T05:24:00.000+01:002007-12-21T05:24:00.000+01:00More general method: my above method + "ALTER TABL...More general method: my above method + "ALTER TABLE"?<BR/><BR/>Or is that also cheating? :)<BR/><BR/>Another idea is to use a script to write into a fifo and use LOAD DATA INFILE on the fifo, but I bet it's not as fast.Toddhttps://www.blogger.com/profile/12678240542771820612noreply@blogger.comtag:blogger.com,1999:blog-16959946.post-28415578912984677102007-12-20T15:43:00.000+01:002007-12-20T15:43:00.000+01:00I suspect that generating a large input file then ...I suspect that generating a large input file then using "load data infile" to import it is going to be the fastest method.<BR/><BR/>If you want pure SQL (well, pure MySQL anyway), stored function performing a bulk insert seems to be fairly quick, but it then becomes a trade off between generating such a function, and time to run it.<BR/><BR/>Something like:<BR/>CREATE FUNCTION `repeats`(x int) RETURNS int(11)<BR/>begin <BR/>set @i = 0;<BR/>repeat<BR/> insert into t1(id) values (1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4),(1),(2),(3),(4); <BR/> set @i = @i+1;<BR/>until @i >= x<BR/>end repeat;<BR/>return 0; <BR/>end <BR/><BR/>(256 values being inserted for each iteration) managed 256000 values in between 2 and 3 seconds (my MySQL install is in a virtual machine with limited memory, so I didn't want to go for the full 4billion!).<BR/><BR/>Of course, you could combine that with inserting a cross product: insert into t1(id) select a.id from t2 as a, t2 as b, t2 as c, t2 as d after a single 256 element insert to t2 (identical structure to t1) should give you the right number of records, but I'm not running that on my server...Unknownhttps://www.blogger.com/profile/13308312380087542027noreply@blogger.comtag:blogger.com,1999:blog-16959946.post-19855899842875465302007-12-20T13:17:00.000+01:002007-12-20T13:17:00.000+01:00Todd,Thanks.Your solution is amazing, because it f...Todd,<BR/>Thanks.<BR/>Your solution is amazing, because it fills a table in 1 fourth of the time. However, this is not really useful for me. The fact that it is performed outside the server makes this solution difficult to integrate with the rest of the work that requires this task.<BR/>Moreover, this is only applicable to MyISAM tables, while I would need to insert these records in different engines.<BR/>Can you find a more general method?Giuseppe Maxiahttps://www.blogger.com/profile/15801583338057324813noreply@blogger.comtag:blogger.com,1999:blog-16959946.post-83238618857196383982007-12-19T22:13:00.000+01:002007-12-19T22:13:00.000+01:00The table is MyISAM in this particular case, but t...The table is MyISAM in this particular case, but the test should be available for any engine.Giuseppe Maxiahttps://www.blogger.com/profile/15801583338057324813noreply@blogger.comtag:blogger.com,1999:blog-16959946.post-15508561824915320722007-12-19T22:08:00.000+01:002007-12-19T22:08:00.000+01:00Is the table InnoDb or MyIsam?Is the table InnoDb or MyIsam?Dirk1231https://www.blogger.com/profile/01604966231936807391noreply@blogger.comtag:blogger.com,1999:blog-16959946.post-79997153329508553242007-12-19T18:47:00.000+01:002007-12-19T18:47:00.000+01:00Here's a Perl script to generate a big table -- us...Here's a Perl script to generate a big table -- usage example:<BR/><BR/>mysql> create table t1 (id tinyint not null);<BR/>mysql> flush tables;<BR/><BR/><BR/>then run the following script with the first argument being 2^32 and the second argument being the path to mysql/data/t1.frm.<BR/><BR/>After it completes, the table will have 2^32 "1"s in it, and it should be essentially limited by IO speed. Changing the CHUNK_REPEAT variable may result in increased performance - I didn't tune it.<BR/><BR/>#!/usr/bin/perl<BR/>use strict;<BR/>use warnings;<BR/>use Math::BigInt qw/:constant/;<BR/>use bignum;<BR/><BR/>my $SIZE_PER_ROW = 7;<BR/>my $CHUNK_REPEAT = 1024;<BR/>my $MYI_DATA = "<BR/>0000000 fefe 0701 0000 0122 00b0 0064 00b0 0000<BR/>0000010 0000 0000 0800 0000 0000 19ff {COUNT}<BR/>0000020 0000 0000 0000 0000 {COUNT}<BR/>0000030 ffff ffff ffff ffff 0000 0000<BR/>0000040 0000 0400 {MYD_SIZE} 0000 0000<BR/>0000050 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000060 0000 0000 0000 0000 0000 0000 0000 022d<BR/>0000070 0000 0016 0000 0000 0000 0038 0000 0000<BR/>0000080 0000 0000 4769 466d 0000 0000 0000 0000<BR/>0000090 0000 0000 4769 466d 0000 0000 0000 0000<BR/>00000a0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00000b0 0000 0000 0000 0400 0000 0000 0000 0000<BR/>00000c0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00000d0 0000 0000 0000 0000 0000 0000 0000 0002<BR/>00000e0 0000 0007 0000 0002 0000 0002 0000 0014<BR/>00000f0 0000 0002 0000 0000 0603 0000 0000 0000<BR/>0000100 0000 0008 0000 0000 0000 0000 0000 0000<BR/>0000110 0000 0000 0000 0001 0000 0000 0000 0100<BR/>0000120 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000130 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000140 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000150 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000160 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000170 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000180 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000190 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00001a0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00001b0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00001c0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00001d0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00001e0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00001f0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000200 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000210 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000220 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000230 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000240 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000250 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000260 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000270 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000280 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000290 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00002a0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00002b0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00002c0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00002d0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00002e0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00002f0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000300 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000310 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000320 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000330 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000340 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000350 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000360 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000370 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000380 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000390 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00003a0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00003b0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00003c0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00003d0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00003e0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>00003f0 0000 0000 0000 0000 0000 0000 0000 0000<BR/>0000400<BR/>";<BR/><BR/>my $MYD_DATA = "ff010000000000";<BR/><BR/>if (scalar @ARGV != 2) {<BR/> die "usage: $0 num frmpath";<BR/>}<BR/><BR/>my ($num_str, $frm_path) = @ARGV;<BR/><BR/>my $num = Math::BigInt->new($num_str);<BR/><BR/>die "frm not found" unless -e $frm_path;<BR/>die "bad extension on $frm_path" unless $frm_path =~ /\.frm$/;<BR/><BR/>my $myd_path = $frm_path;<BR/>$myd_path =~ s/frm$/MYD/;<BR/>die "myd not found" unless -e $myd_path;<BR/><BR/>my $myi_path = $frm_path;<BR/>$myi_path =~ s/frm$/MYI/;<BR/>die "myi not found" unless -e $myi_path;<BR/><BR/>&make_myd($myd_path, $num);<BR/>&make_myi($myi_path, $num);<BR/><BR/>sub make_myi {<BR/> my ($out, $num_records) = @_;<BR/><BR/> open(OUT, ">$out") or die "Couldnt open $out: $!";<BR/> binmode OUT;<BR/><BR/> my $num_hex = $num_records->as_hex();<BR/> $num_hex =~ s/^0x//;<BR/> $num_hex = sprintf("%016s", $num_hex);<BR/><BR/> my $size_hex = ($num_records * $SIZE_PER_ROW)->as_hex();<BR/> $size_hex =~ s/^0x//;<BR/> $size_hex = sprintf("%016s", $size_hex);<BR/> <BR/> my @lines = split("\n", $MYI_DATA);<BR/> s/^\S+// for @lines;<BR/> chomp(@lines);<BR/> s/\s//g for @lines;<BR/> my $data = join("", @lines);<BR/><BR/> $data =~ s/{COUNT}/$num_hex/g;<BR/> $data =~ s/{MYD_SIZE}/$size_hex/g;<BR/> print OUT pack("H*", $data);<BR/> close(OUT);<BR/>}<BR/><BR/>sub make_myd {<BR/> my ($out, $num_records) = @_;<BR/> open(OUT, ">$out") or die "Couldnt open $out: $!";<BR/> binmode OUT;<BR/><BR/> my ($chunks, $remainder) = $num_records->bdiv($CHUNK_REPEAT);<BR/> $chunks = $chunks->bfloor();<BR/> print STDERR "chunks: $chunks rem: $remainder\n";<BR/><BR/> my $chunk = pack("H*", $MYD_DATA) x $CHUNK_REPEAT;<BR/><BR/> for (1..$chunks) { print OUT $chunk; }<BR/> for (1..$remainder) { print OUT pack("H*", $MYD_DATA); }<BR/> close(OUT);<BR/>}Toddhttps://www.blogger.com/profile/12678240542771820612noreply@blogger.com