- Some employees came to work with the passport they had before the political change, and now that the country has split, they have different passports, depending on which side of the new border they live.
- Some employees have parents from different countries and are entitled to more than one passport. They came to work initially with one passport, and came again a few year later with a different one because of different benefits.
- Some employees started working with a given citizenship, and then they got a different one because of marriage, political asylum, or other legal means.
- Some employees became stateless. The equivalent of NULL in a table field.
Friday, September 02, 2011
Primary keys from experience
From time to time I see articles in defense of natural primary keys against surrogate keys. I don't take an immovable stand on either side, as I have seen good cases for both. In general, I like the idea of a natural primary key, when I see one that it is really natural. Quite often, though, a natural primary key has proved itself faulty. And most of the times, when such faults happened, it was because of limited understanding of the data. I like to describe this kind of misinformation as data analysis culture clash. When choosing a natural primary key, one should consider which element, or group of elements, are unique in a given set. Not only that, they must be immutable, at least within that set. For example, in a group of people, we may assume that a combination of name, surname, date and place of birth is a good natural primary key. Well, no. It isn't, for several reasons. Read on for a few real cases. If we rule out the above combination, perhaps we could use the Social Security Number as a valid key? Not really. The fact is that the above assumptions work well if we consider only people from the same country in current times. If we extend our data definition to include people from different countries, or historical records, then the assumption collapses. A practical case: The birth place. This is a fair assumption. In combination with other elements (e.g.: date of birth, name, and surname) it can provide good basis for unique and immutable records. If you consider people in the United States today, you are right. Even in the UK, or Italy, or France. But try applying this method to places with recent political changes due to war or revolutions, and suddenly the name of the town may suddenly change. What was before VictoryBurg in Oppresslandia is now known as Heroes City in Freelandia. And this happens now, in the 21st century. If your set includes historical data, these occurrences may become frightfully frequent. Speaking of historical times, if you are dealing with really old records, you may want to consider that dates are less than immutable. The way we count days in most Western countries is called the Gregorian calendar, which was adopted by a handful of countries in 1582. Dates before October 5, 1582 and after October 14, 1582 are recorded using two different calendars. It is inconvenient, but if you know that you may avoid wrong date calculations. Simple, isn't it? Not really. If your records include data from different countries, you will have to take into account when this calendar was adopted. Just to give a few examples, England adopted the Gregorian calendar in 1752, Japan in 1873, Russia in 1918, Turkey in 1926. When dealing with people from different countries, you may be tempted to use citizenship as an immutable property. That may work, if you consider citizenship at birth and are prepared to keep the names of not currently existing states. If, instead, you get the citizenship from the employee's passport, you may incur in one or more of the following cases (all happened in practice during my work with an international organization):