MySQL success is due to not only to its efficiency, but also to its ease of administration.
And despite what its detractors say about lack of features (that was before
5.0, of course), there is a feature that is crucial for enterprise database
management, and MySQL has had it built-in since early version. I am talking
about data replication that can trasform a single server into a scalable and
highly reliable system with minimum effort and expense.
MySQL data transfer methods
There are differences between these two approaches. Choosing MySQL Cluster or standard replication is a trade off.method | description | the good | the bad |
---|---|---|---|
replication | asynchronous transfer, based on one master and N slaves | Fast, reliable, robust, easy to set up, no special HW required, transaction-safe | needs some care in the code |
MSQL Cluster | synchronous transfer, based on a cluster engine | fast, no code modification needed | require huge amounts of RAM, not easy to set up, minimal transaction support |
OTOH, if you have an application that is already established (and perhaps complex and large), adapting it to the one-master-many-slaves architecture could be a problem.
There are external tools that promise to fill the gap, and offer clever solutions, mostly based on in-between filtering layers that give your application the illusion that there is just one server, while in truth there are several ones. Also these solutions cost you something, not only in mere money, (since they are - with limited exceprions - commercial tools), but also in features, because filters mostly are not transaction safe.
Different needs for data transfers
Aside from the mere method of data movement, there are other needs from the enterprise. Notice that this is not some stretch of my imagination, but the practical result of several years of consulting in various database-related matters.One thing that most medium to large companies wanted is the reverse of replication, i.e. they want the master to be updated by many slaves. The need is clear when you think of several branches of a company in need of sending updates about sales records and store levels. Unfortunately, simple replication does not allow this feature, and even the cluster would not be bent easily to this need.
Another thing that companies want is a conditional replication, based on rules that are far beyond the limited configuration range of both replication and cluster. For example some companies wanted a table migrated from server A to server B, but to different tables, depending on a time frame.
New data transfers paradigms: FEDERATED tables
The reverse replication can be achieved with tables using the FEDERATED engine .For each table you need to replicate from the branch database to the main house, you just need to establish a FEDERATED table and a couple of triggers.
Let's day, just for the sake of it, that you have the main server at big.central.mycompany.com, and the branches as far.away.mycompany.com (codename : FAW) and at fairly.distant.mycompany.com (code name: FDI). The main table you need to update at big.central lays in the "biz" database and looks like this:
CREATE TABLE all_salesIn each branch, you have one table for your branch sales, that you update every day with the consolidate data from your detail tables.
(
branch_id char(3) not null,
sale_date datetime not null,
item_id int not null,
quantity int not null,
primary key (branch_id, sale_date, item_id)
) engine=innodb;
CREATE TABLE branch_salesTo get the data from branch_sales to the main central table, you build a mirror of such table in each branch, but instead of being InnoDB or MyISAM, it will be FEDERATED.
(
sale_date datetime not null,
item_id int not null,
quantity int not null,
primary key (sale_date, item_id)
) engine=innodb;
CREATE TABLE central_all_salesAs a final touch, you add a few triggers to your branch table, to get the data moving:
(
branch_id char(3) not null,
sale_date datetime not null,
item_id int not null,
quantity int not null,
primary key (branch_id, sale_date, item_id)
) engine=FEDERATED
CONNECTION='mysql://user:pwd@big.central.mycompany.com/biz/all_sales';
create trigger branch_sales_ai after INSERT on branch_sales
for each row
insert into central_all_sales
values ( 'FAW', new.sale_date, new.item_id, new.quantity);
create trigger branch_sales_au after UPDATE on branch_sales
for each row
update central_all_sales
set sale_date = new.sale_date,
item_id = new.item_id,
quantity = new.quantity
where
sale_date = old.sale_date
and
item_id = old.item_id;
create trigger branch_sales_ad after DELETE on branch_sales
for each row
delete from central_all_sales
where
branch_id = 'FAW'
and
sale_date = old.sale_date
and
item_id = old.item_id;
Similar triggers (with a branch_id of 'FDI') will be created in the second branch. Now, every record in branch_sales is replicated to the central table without himan intervention. For extra safety, you may add some codeto check that the record does not exist yet, before inserting.
New data transfers paradigms: BLACKHOLE tables
Have you heard about the BLACKHOLE engine?It's a simple concept. A table with a full description, that does not store anything. Every thing you send to such table is simply discarded, and that is no surprising, since there is no storage device associated with BLACKHOLE tables.
What's the use, then? You may ask. The use is to take advantage of the side effects of such table, rather that its storage capabilities. For once, records sent to a BLACKHOLE table will leave a trail in the binary log, thus allowing you to replicate its data from a server that does not have much storage available, but you need it just as an intermediate level to spread your data to several more slaves.
Moreover, and here we come to the point of our quest, BLACKHOLE tables can fire triggers, and we are going to use them to meet the second need we have outlined, i.e. the conditional update.
Let's say that big.central.mycompany.com want the sales records sent to different tables depending on the time of the day (it's a silly example, but bear with me for a while, just pretending it's something sensible). However, the application that deals with the data transfer is already working, and nobody dares to touch it, for fear of disrupting something. But a simple change can achieve the goal.
Let's change the structure of all_sales:
CREATE TABLE all_salesJust one word changed, and now the table will discard any data sent to it.
(
branch_id char(3) not null,
sale_date datetime not null,
item_id int not null,
quantity int not null,
primary key (branch_id, sale_date, item_id)
) engine=BLACKHOLE;
Before seeing what we can do with it, let's create three more tables.
CREATE TABLE all_morning_sales
(
branch_id char(3) not null,
sale_date datetime not null,
item_id int not null,
quantity int not null,
primary key (branch_id, sale_date, item_id)
) engine=innodb;
CREATE TABLE all_afternoon_sales
(
branch_id char(3) not null,
sale_date datetime not null,
item_id int not null,
quantity int not null,
primary key (branch_id, sale_date, item_id)
) engine=innodb;
CREATE TABLE all_late_salesHere we have everything ready. Now let's add one trigger:
(
branch_id char(3) not null,
sale_date datetime not null,
item_id int not null,
quantity int not null,
primary key (branch_id, sale_date, item_id)
) engine=innodb;
delimiter //
create trigger all_sales_ai after insert on all_sales
for each row
begin
case
when time(sale_date) between '08:00:00' and '13:00:00'
then insert into all_morning_sales
values (new.branch_id, new.sale_date, new.item_id,
new.quantity);
when time(sale_date) between '13:00:01' and '18:00:00'
then insert into all_afternoon_sales
values (new.branch_id, new.sale_date, new.item_id,
new.quantity);
else
insert into all_late_sales
values (new.branch_id, new.sale_date, new.item_id,
new.quantity);
end case;
end //
delimiter ;
Similarly, you can create a trigger for updates and deletes, and of course you can get creative and update several tables at once with different parts of the incoming data.
Here. Now you have some food for thought and launch yourselves into a new data transfer enterprise.
1 comment:
Hi Giuseppe,
I just want to say that I enjoyed reading this article very much.
I've never occupied myself with replication, or these new storage engines, but this seems a valuable primer.
Thanks!
Post a Comment