The Data Charmer: postgresql

Showing posts with label postgresql. Show all posts

Wednesday, March 27, 2013

Multi-master data conflicts - Part 2: dealing with conflicts

In the first part of this article we examined the types of conflicts and their causes. In this part, we will analyse some of the methods available to deal with conflicts.

Pessimistic locking (or: conflicts won't happen)

Applicability: synchronous clusters with 2pc

We've covered this topic in the previous article, but it's worth repeating. If you use a synchronous cluster, you don't have conflicts. For example, MySQL Cluster ensures consistent data with updates coming from different nodes. However, MySQL Cluster is not a replacement for a MySQL server, and it has severe limitations.

Optimistic locking

Applicability: synchronous clusters without 2pc (Galera)

Conflicting transactions proceed on different nodes with local locking. The last one then rolls back when it discovers a prior transaction got in first on the same data. For a more detailed analysis of this handling method, see this article by Jay Janssen

Conflict resolution after-the fact

Applicability: EnterpriseDB (none so far for MySQL)

Asynchronous replication is hard for conflicts. A conflict in this state means that the data has been applied to the wrong node or to the wrong object, and something must be done to solve the issue.

Typical remedies offered for conflict resolution are:

Earliest or Latest Timestamp: This method says that the oldest or the latest record prevails when a conflict happens. This is hardly a reliable resolution. It's the easiest method to implement, and thus it is offered. But it often results in a hidden data inconsistency problem, where we may find data that we don't expect. The current data was applied simply because it was updated later than the correct record. Also, timestamp calculation requires time synchronization across servers, and possibly across timezones, which calls for extra effort to keep the system functioning.
Node Priority: There is a hierarchy of nodes, with different ranks. When a conflict occurs, the node with the highest rank prevails. This method requires the data origin to be stored alongside the contents, and to be easily searchable when conflicts occur. It must also take into account offline nodes, and therefore it should keep the conflict resolution metadata until the offline nodes are back in synch.

Methods that could be implemented in a more advanced technology may include:

Origin enforcement: data coming from authorized nodes will be preserved. Data from wrong origin will be dropped, and a consolidation event will be generated and sent to the other nodes. This method would be possible in systems (like Tungsten) that keep track of the event origin.
Data merge: If possible and desirable, data from two different sources can be preserved, and merged in the destination table. This rule should also originate a new event to fix the data in the other nodes.

Schema renaming

Applicability: fan-in topologies

Fan in with likely conflicts

Image #1 - Fan-in topology with likely conflicts.

A fan-in topology is easy to implement with Tungsten Replicator, but not easy to maintain. By its nature, fan-in is a conflict waiting to happen. Assuming that all the masters have the same structure, they will replicate multiple changes into the same schema, and it is quite likely that some changes will clash. For this reason, the simple solution often adopted is renaming the schema before the data reaches the slave.

Fan in with schema renaming

Image #2 - Fan-in topology with schema renaming.

I know of at least one user who has successfully applied this technique for a cluster made of 70 masters and one slave.

Conflict prevention: Discipline

Applicability: all topologies

A simple way of preventing conflicts, and one that would make life easier for all is discipline. The organization decides which entry points can update which data, and conflicts are not possible, because the data is inserted or modified only in the places where it is supposed to be.

Multi master r w split

Image #3 - Preventing conflicts with discipline in a star topology.

Conflict prevention: Enforced discipline

Applicability: all topologies

If you have worked in any large organization, either public or private, you know that discipline alone is the worst method you can rely on for something so delicate and valuable as your data. The reasons why this paradigm could fail are many: it could be because some people dislike discipline, or because someone makes a mistake, or because there are too many rules and they don't remember, or because of an application bug that lets you update what you shouldn't.

Either way, you end up with a system that has conflicts and nobody knows what happened and how to fix them. However, there is a way of enforcing this system based on discipline.

This is the "poor-man's" conflict avoidance system. It is based on simple technology, available in most database servers. If you can install a multi-master topology, using either native MySQL (circular) replication or Tungsten Replicator topologies, you can also apply this method.

The key to the system is to grant different privileges for every master. Looking at image #3, you can enforce discipline by granting different privileges to the application user in every master.

In master #1, where we can update personnel, app_user will have SELECT privileges on all databases, and all privileges on personnel.

In master #2, where we can update sales, app_user will have all privileges on sales and read only access to the other databases, and so on.

The key to make this system work well is that you should assign the privileges and not let the GRANT statement being replicated. It should work like this:

# master 1
GRANT SELECT on *.* to app_user identified by 'my password';
# This is good for all masters. Let it replicate

# master 1
SET SQL_LOG_BIN=OFF;
GRANT ALL on personnel.* to app_user;   # This won't replicate


# master 2
SET SQL_LOG_BIN=OFF;
GRANT ALL on sales.* to app_user;

# master 3
SET SQL_LOG_BIN=OFF;
GRANT ALL on vehicles.* to app_user;

# master 4
SET SQL_LOG_BIN=OFF;
GRANT ALL on buildings.* to app_user;

This method works quite well. Since updates for a given schema can be applied only in one master, there is little chance of any mischief happening. Conflicts are not completely removed, though. There are super users and maintenance users who can, consciously or not, introduce errors. For these cases, you may want to look at the next section.

Enforced discipline with certified origin

Applicability: all Tungsten topologies

Discipline based on granted privileges is often robust enough for your needs. However, if you want to keep track of where the data comes from, you should look at a System Of Records technology, where the origin of each piece of data can be traced to its origin.

Tungsten Replicator implements this technology with several topologies. The theory of this matter is beautifully explained by Robert Hodges in an article written some time ago. Here I would like to look at the practical stuff.

To implement a System of Records in Tungsten, you decide where you want to update each schema (which is defined as a shard in our lingo,) assign that schema to a service, and the replicator will enforce your rules.

Once you have defined the shards, you can set the rules. When an event comes to a slave from an UNKNOWN shard, i.e. a shard that was not among the defined rules, you can:

Accept the event; (not recommended, really)
Drop the event silently
Drop the event with a warning in the logs;
Generate an error that will break replication (recommended)

You can choose among the above actions when setting a rule for events that come from UNWANTED shards, i.e. a shard that is not the one designated to update that schema.

Here's an example of a shard definition based on an all-masters schema with three nodes:

Conflict prevention 0

Image #4 - Sample conflict prevention in an all-masters topology

# Options to add during installation
--svc-extractor-filters=shardfilter

# policy for unknown shards
--property=replicator.filter.shardfilter.unknownShardPolicy=error

# policy for unwanted shards
--property=replicator.filter.shardfilter.unwantedShardPolicy=error

# Whether the policy for unwanted shards is activated or not
--property=replicator.filter.shardfilter.enforceHomes=false

# whether we allow whitelists to be created
--property=replicator.filter.shardfilter.allowWhitelisted=false


# Loading the rules set

$ trepctl -host host1 -service charlie shard -insert < shards.map

$ cat shards.map
shard_id          master      critical
employees         alpha       false
buildings         bravo       false
vehicles          charlie     false
test              whitelisted false

The rules are set by service, rather than host name. The schema 'employees' can be updated by the service named 'alpha', which has its master in host #1. Similarly, 'buildings' can be updated by 'bravo', with a master in host #2, and 'vehicles' is updated by 'charlie' master service in host #3. Remember that in Tungsten each replication stream from one master to many slaves is a separate service. This way we can keep track of the events origin. Even if the event is routed through a hub in a star topology, it retains its origin in the metadata.

The last line of the rules says that the schema 'test' is whitelisted, i.e. it can be freely updated by any master. And this means that conflicts can happen there, so be careful if you use this feature!

Conflict prevention right event1

Image #5 - Example of a legitimate event coming through

When an expected event comes through, all is well. Each node checks that the event was originated by the authorised master, and the event is applied to the slave service.

Conflict prevention wrong event0

Image #6 - Example of an event originated from an unauthorised node

When the event comes from a node that was not authorised, Tungsten looks at the rules for such case. In our setup, the rule says 'error', and therefore replication will break at the receiving end of the service 'bravo' in host #1 and host #3.

mysql #2> create table employees.nicknames( ... )


# Only server #2 creates the table
# slave service 'bravo' in host1 and host3 get an error
# No table is created in hosts #1 and #3

To detect the error, we can ask for the list of services in host #1 and host #3. What we will see is something like this.

#3 $ trepctl services | simple_services 
alpha    [slave]
seqno:          7  - latency:   0.136 - ONLINE

bravo    [slave]
seqno:         -1  - latency:  -1.000 - OFFLINE:ERROR

charlie  [master]
seqno:         66  - latency:   0.440 - ONLINE

This Listing says that replication was stopped with an error in slave service 'bravo'. To determine what happened exactly, we ask for the status of that service:

#3 $  trepctl -service bravo status
NAME                     VALUE
----                     -----
appliedLastEventId     : NONE
appliedLastSeqno       : -1
appliedLatency         : -1.0
(...)
offlineRequests        : NONE
pendingError           : Stage task failed: q-to-dbms
pendingErrorCode       : NONE
pendingErrorEventId    : mysql-bin.000002:0000000000001241;0
pendingErrorSeqno      : 7
pendingExceptionMessage: Rejected event from wrong shard: 
seqno=7 shard ID=employees shard master=alpha service=bravo
(...)

This status gives us quite a lot of useful information:

The event with Global transaction ID (GTID) # 7 was rejected;
The reason for rejection was because it came from the wrong shard;
The expected shard master (i.e. the authorized service) was alpha;
The event was instead originated from service bravo.

With the above information, we can take action to fix the event. We know that GTID 7 is wrong, so we can skip it in both servers where the error occurred. To clean up the error, we can simply generate the correct event in the authorized master

#host #1 
$ trepctl -service bravo online -skip-seqno 7

mysql #1> drop table if exists employees.nicknames;
mysql #1> create table if exists employees.nicknames ( ... ) ;

#3 $ trepctl -service bravo online -skip-seqno 7

Statement-based vs row-based replication

As a general note about conflict solving, I need to mention that, in most cases, using row-based replication vs. statement based will help identifying conflicts, making them easier to clean up.

Even when the conflict involves a deleted row, row-based events will contain enough information that will allow us to identifying the critical data needed to recover information.

Be aware that, if you use binlog-row-image=minimal in MySQL 5.6, the binary log entry for a DELETE event will only include the primary key.

More about filters

We have seen at least in two examples (server renaming and conflict prevention) that you can help avoid conflicts with filters. This is a powerful feature that should be taken into account when planning a multi-master topology.

MySQL native replication offers very little in matter of data transformation through filtering. Tungsten Replicator, instead, allows you to define filters at several stages of the replication process: when extracting the data, after transporting it to the slaves, before applying it. You can write your own filters in JavaScript, and do with the data pretty much everything you want. If you have creative ideas about solving conflicts by manipulating data in transit, there is a good chance that you can implement them using filters. This topic deserves more than a paragraph, and probably I will come back to it soon with a full fledged article.

Parting thoughts

Multi master topologies are much coveted features. However, they often introduce the risk of conflicts.

Dealing with conflicts becomes somewhat easier if you understand how they happen and what kind of problems they generate.

There is no silver bullet solution for conflicts, but recent technology and good organization can help you ease the pain.

Friday, May 06, 2011

Open Database Camp 2011 opens today!

The Open Database Camp 2011 opens today with the Welcome Party, starting today at 7pm CEST. The party (with good Italian food and drinks) is open to all the ones who have registered in the Attendees list.
By car you have to reach Pula, take Via Nora (Nora Street), than Via Sant'Efisio (Sant'Efisio Street), until the end, directly to the party location.
Organisers will also make a bus available on Friday 6 May, leaving from Pula Hotels (Nora Club Hotel - Villa Madau - Baia Di Nora - Is Molas - Marin Hotel - Is Morus Hotel) around 18:30 and reaching Nora. From Nora back to Pula Hotels a bus will leave around 21:00 and 22:00.

The conference itself will start on Saturday, May 7th, at 9am. Travel arrangements to reach the venue are listed in the conference wiki (wiki: Travel).
There will be a bus collecting participants from the hotels at 8:30am and the same bus from the conference venue (Sardegna Ricerche) back to the hotel in the evening.

Sessions

Open Database camp is an Un-conference. As with the previous editions, the schedule will be decided on the spot, on Saturday. You can list your intended sessions and your wishes in the Sessions page.

Logistics

You will need an ID for your wifi access. (Sorry, it's a law requirement)^(*). If you want your username and password, you should collect it at the reception as soon as possible.

Customized meetings

There will be room for 1:1 meetings between attendees, if you like to do so. You will find a board with the names of all attendees and their affiliation, and you can easily schedule a meeting with them.

^(*) The law has expired but after having put the fear of the state into every internet provider, the lawmakers have not said how the new regulations should be applied. Regretfully, we have still to live with the old rules.

Monday, May 02, 2011

Introducing the Flying Clusters, and more than MySQL replication

My Colleague Linas Virbalas has just crossed the boundary between real and virtual and has started a blog, titled Flying Clusters.
Linas is a gifted developer who is taking care of the special projects. One of such projects is replication between MySQL and PostgreSQL, which works quite well.

Another project, which has just started, is about providing PostgreSQL with Advanced Logical Replication using Tungsten replicator. As you probably know, recent versions of PostgreSQL can do physical replication, which has its pros and cons. With this project, PostgreSQL users can also have the choice of using logical replication. Not only that: since Tungsten Replicator already supports MySQL, cross-DBMS replication clusters are not far away.
We are also more ambitious, and we are exploring ways of replicating to NoSQL entities. We will start with MongoDB at a dedicated SQL to NoSQL Replication Hackathon, where we will attempt the creation of n applier for MongoDB during the Open Database Camp conference.

Thursday, April 28, 2011

Replication : different points of view

The following quotes are the first sentences in the replication chapter of two similar books. Both are admin cookbooks. One is for PostgreSQL, one for MySQL.

Replication isn't magic, though it can be pretty cool. It's even cooler when it works, and that's what this chapter is all about.
Replication requires understanding, effort, and patience. There are a significant number of points to get right. My emphasis here is on providing simple approaches to get you started, and some clear best practices on operational robustness

PostgreSQL 9 Admin Cookbook

Replication is an interesting feature of MySQL that can be used for a variety of purposes. It can help to balance server load across multiple machines, ease backups, provide a workaround for the lack of fulltext search capabilities in InnoDB, and much more.

MySQL Admin Cookbook

The PostgreSQL quote warns of a dangerous, difficult, and unexplored path, while the MySQL one is the almost bored remark of someone whose biggest problem is to list how many good things you can do. I guess that being exposed to a given technology for longer time changes one's perception.

Wednesday, April 27, 2011

Open Database camp 2011 - Travel logistics, and don't forget the party

The Open Database Camp 2011 is near. In 9 days, the welcome party starts, and then the conference itself gets going.
If you are coming earlier than Friday, May 6th, you can either use public transportation or book a private seat with a volunteer in the car pooling page. Please help the organizers: post your arrival and departure dates and times, so we may be able to help you even outside the official conference days.
About the conference itself, as everyone should know, it's a un-conference, where the talks will be decided on the spot. But you can book ideas and topics in the sessions page.
Since we will have many participants from Italy, there will be dedicated sessions in Italian in addition to the ones in English, which is the official language of the conference.

Friday, April 15, 2011

Have you missed the MySQL Conference? Come to OpenDbCamp!

The MySQL Conference is over. There have been many new developments, and the ones who have missed it will probably want to get a summary of the excitement, possibly from the people who have contributed to shaping the news.
The Open Database Camp will give users an opportunity to catch up. Especially to open source users in Europe.
Come and share the fun. There will be talks on MySQL, PostgreSQL, several NoSQL products, and a bunch of other cool stuff.

Monday, April 04, 2011

Open Database camp 2011 - Opportunities for sponsors, culture, and more

The Open Database Camp 2011 is barely one month away.

Sponsorship

Many thanks to all the sponsors! We very much appreciate your support.
Speaking of what, here is some important information for sponsors: The venue owners, Sardegna Ricerche, has given us the availabilkity of an ample hall for sponsors, where they can showcase their products and services.
Each sponsor will have a desk, and a double panel sized cm 195 x 75 (6.3 x 2.4 feet).

Culture, fun, and more

The Science park is something unique that geeks may want to visit. It is one of the biggest research centers in Europe, and the owners have graciously organized a guided tour before and after the conference.
Near the conference there is Nora, an archeological site that alone is worth the trip for a visit. You can see it during the welcome party on Friday (if you show up before sunset, that is), or you can visit on your own after the conference.

To give you an idea of what expects you, here is a promotional video of Sardegna Ricerche.

The event should attract many local open source enthusiasts, with varying degrees of knowledge about open database. To meet their curiosity, there will be a parallel beginners track, with introductory sessions to open databases. SQL and noSQL fans, get ready to evangelize your beloved products. There will be many people eager to listen!

Thursday, February 03, 2011

Open Database Camp - Accommodation and Sponsoring

The Open Database Camp 2011 is shaping up nicely.
The logistics is being defined and local and international volunteers are showing up for help. (Thanks, folks!)
If you want to start booking, there is a list of hotels in the Accommodation page.
And don't forget to sign up in the Attendees list.
Local travel information will be released as soon as we finish cranking up the plan.
Open Database camp is free, but we still have expenses to get the job done.
We need both official sponsors and personal donations. No minimum amount required. You can donate painlessly online through the nonprofit organization Technocation. (Thanks!)
Please see the Sponsors page for more info.

Thursday, January 06, 2011

Announcing the Open Database Camp - Sardinia, May 2011

I have been traveling to many conferences in the last 10 years, and many times I have been asked to organize an event in my native land, Sardinia. After delaying the inevitable for long time, here I can announce it. The Open Database Camp 2011 will take place in Sardinia, hosted by the Sardinia Technology Park, a local scientific and business institution with international links.
Mark your calendars: the Open Database Camp will be held in Sardinia on May 6-7-8, 2011.
I have already confirmed the venue, and I will have full cooperation from Sardegna Ricerche about the conference logistics. I will meet the organizers on January 27th to get in touch with nearby hotels and restaurants and negotiate rates.
The place is a beautiful and modern compound, built in the middle of a forest. About 40 Km from Cagliari and its airport. There is a public bus service to reach the venue, and there will be an integrative bus during the conference.
The place is a few kilometers from the sea resort of Pula, near the archeological beauty of Nora.

If you want to start booking your flights, look for connections to Cagliari Elmas.

There are cheap direct flights from several European airports with EasyJet, Ryanair, TUIFly, Air Berlin, and probably a few more.
For example, you can fly to Cagliari from Paris, Frankfurt, Berlin, Cologne, Munich, Stuttgart, London, Edinburgh, Brussels, Madrid, Barcelona, Seville, Valencia, Venice, Rome, Milan, Turin, Basel, Geneva, Krakow, and probably more by the time you come.
If you book now, you should be able to get a good price.

The weather in Sardinia is mild. May is almost summertime. If you live in cold places like the North of the USA, Canada, Scandinavia, May in Sardinia is definitely warmer.
More logistics information will come.

Why Open Database Camp, and not Open SQL Camp like before?
The Open SQL Camp tradition has evolved since its inception in 2008. It has now become a gathering of database professionals and enthusiasts, not necessarily identifiable with the SQL constraint.
So, the conference welcomes everyone who deals with open databases, regardless of the languages used to interface them.

Stay tuned for more info. In the meantime, you can discuss this matter in the opensqlcamp Google Group.

Friday, December 03, 2010

My picks for PGDay-EU 2010

On Sunday I will be in Stuttgart with the double purpose of attending the annual European PostrgreSQL conference and the technical meeting of my company that will be held after the normal proceedings of PGDay-EU.
For the first time in several years I am attending a conference where I am not a speaker. In my previous job I did not have much opportunity to attend PostgreSQL meetings, and I welcome this opportunity. The schedule is quite interesting, and I have made my personal picks:

Monday:
- 09:45 - 10:45 Keynote: Back To The Future of Open Source.
  Simon Phipps has always an interesting way of making a topic interesting. Looking forward to hearing what he has to say.
- 11:10 - 12:00 Play chess against PostgreSQL (and get beaten).
  For a chess enthusiast and database professional, this talk is going to be doubly interesting!
- 12:10 - 13:00 Managing PostgreSQL Replication.
  In my new job, this topic is going to be paramount.
- 14:00 - 14:50 Liberating Your Data From MySQL: Cross-Database Replication to the Rescue!.
  This will be a presentation of Tungsten cross DBMS replication, therefore interesting for many personal reasons!
- 15:20 - 16:10 Concurrency & PostgreSQL.
  Undoubtedly a topic that will become quite useful when dealing with PG replication issues that I will be facing in my job.
Tuesday:
- 14:10 - 15:00 PostgreSQL extension's development.
  I know a lot about extensions in MySQl. It's time to start learning similar about PG!
- 15:30 - 16:20 Closing keynote: PostgreSQL's Time to Shine: The most disruptive force in open source since Linux
  This sounds like a juicy strategic talk.

Thursday, September 10, 2009

Sponsoring OpenSQLCamp

The next OpenSQLCamp will be held in Portland, Oregon, USA. It is being organized by Eric Day, well known to the open source community for his active and productive participation to several projects (especially Drizzle and Gearman).

The event is public and free. Therefore, it needs public sponsoring. I don't know yet if I can attend, but I have already donated something to the organizers, and I am officially a sponsor. You can be one too. Simply go to the sponsors page and donate a minimum of $100 as in individual or $250 as an organization.
And of course, if you plan to participate, register yourselfand eventually propose a session.
OpenSQLCamp is a fun, equal level event. If you have something to say, write a proposal, and the other participants will tell you if they want to hear it or not. Either way, you will learn something.

Saturday, November 15, 2008

OpenSQLCamp - Pictures from the first day

The OpenSQLCamp started yesterday in Charlottesville, VA.
As expected, the gathering is impressive. There are many well known names from the open source database world. MySQL is probably overrepresented, but that's fair, considering its wide adoption.

Yesterday was an informal "meet anyone and let's see what we do tomorrow". The schedule, as you can see, is very dynamic.