The examples given are quite slow to create. For those willing to try out the same tables with a filling function a bit faster, here goes:
drop table if exists no_part_tab;Using this data, the timings for my comparison are:
create table no_part_tab
(c1 int(11) default NULL,
c2 varchar(30) default NULL,
c3 date default NULL) engine=myisam;
drop table if exists part_tab ;
CREATE TABLE part_tab
( c1 int default NULL,
c2 varchar(30) default NULL,
c3 date default NULL
) engine=myisam
PARTITION BY RANGE (year(c3)) (PARTITION p0 VALUES LESS THAN (1995),
PARTITION p1 VALUES LESS THAN (1996) , PARTITION p2 VALUES LESS THAN (1997) ,
PARTITION p3 VALUES LESS THAN (1998) , PARTITION p4 VALUES LESS THAN (1999) ,
PARTITION p5 VALUES LESS THAN (2000) , PARTITION p6 VALUES LESS THAN (2001) ,
PARTITION p7 VALUES LESS THAN (2002) , PARTITION p8 VALUES LESS THAN (2003) ,
PARTITION p9 VALUES LESS THAN (2004) , PARTITION p10 VALUES LESS THAN (2010),
PARTITION p11 VALUES LESS THAN MAXVALUE );
delimiter //
drop procedure if exists load_part_tab //
CREATE PROCEDURE load_part_tab( max_recs int, rows_per_query int)
begin
declare counter int default 0;
declare step int default 0;
declare base_query varchar(100) default 'insert into part_tab values ';
declare first_loop boolean default true;
declare v int default 0;
set @query = base_query;
while v < max_recs
do
if (counter = rows_per_query) then
set first_loop = true;
set counter = 0;
prepare q from @query;
execute q;
deallocate prepare q;
set @query = base_query;
set step = step + 1;
select step, v, now();
end if;
if (first_loop) then
set first_loop = false;
else
set @query = concat(@query, ',');
end if;
set @query = concat( @query,
'(', v, ',',
'"testing partitions"',',"',
adddate('1995-01-01',(rand(v)*36520) mod 3652), '")'
);
set v = v + 1; set counter = counter + 1;
end while;
if (counter) then
prepare q from @query;
execute q;
deallocate prepare q;
end if;
end
//
delimiter ;
call load_part_tab(8000000,1000);
insert into no_part_tab select * from part_tab;
select count(*) from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31';Now, coming to the purpose of this post, I created a further table using the ARCHIVE engine, to check if I could couple the gains in size reduction with the speed improvements allowed by partitioning.
+----------+
| count(*) |
+----------+
| 795181 |
+----------+
1 row in set (9.05 sec)
select count(*) from part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31';
+----------+
| count(*) |
+----------+
| 795181 |
+----------+
1 row in set (1.04 sec)
The immediate transformation from MyISAM to ARCHIVE does not work (
drop table if exists part_archive_tab;And I repeated the same search:
CREATE TABLE `part_archive_tab` (
`c1` int(11) not null ,
`c2` varchar(30) default NULL,
`c3` date default NULL
-- unique key (c1)
) ENGINE=ARCHIVE DEFAULT CHARSET=latin1
PARTITION BY RANGE (year(c3) )
(
PARTITION p0 VALUES LESS THAN (1995),
PARTITION p1 VALUES LESS THAN (1996),
PARTITION p2 VALUES LESS THAN (1997),
PARTITION p3 VALUES LESS THAN (1998),
PARTITION p4 VALUES LESS THAN (1999),
PARTITION p5 VALUES LESS THAN (2000),
PARTITION p6 VALUES LESS THAN (2001),
PARTITION p7 VALUES LESS THAN (2002),
PARTITION p8 VALUES LESS THAN (2003),
PARTITION p9 VALUES LESS THAN (2004),
PARTITION p10 VALUES LESS THAN (2010),
PARTITION p11 VALUES LESS THAN MAXVALUE ENGINE = ARCHIVE
);
insert into part_archive_tab select * from part_tab;
select count(*) from part_archive_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31';That's very interesting. As I expected, a COUNT without a WHERE clause to an ARCHIVE table means a full table scan. But when we use a WHERE clause involving a range, the performance is really impressive. Using the new PARTITIONS extension of the EXPLAIN command, we get this confirmed:
+----------+
| count(*) |
+----------+
| 795181 |
+----------+
1 row in set (1.29 sec)
select count(*) from part_archive_tab;
+----------+
| count(*) |
+----------+
| 8000000 |
+----------+
1 row in set (9.24 sec)
explain partitions select count(*) from part_archive_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31'\GUnfortunately, it seems that there are still some problems (
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: part_archive_tab
partitions: p1
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 798458
Extra: Using where
1 comment:
I understand that archiving of data means moving the data from operational table to another table whose engine type is archive. However my doubts is still the load on the operational database due to the historical data is not reduced by 100%. I am looking for any way by which user could archive the data in a remove mysql process? Is this a valid problem, if so how it is approached in real time
Post a Comment