From time to time I see articles in defense of natural primary keys against surrogate keys.
I don't take an immovable stand on either side, as I have seen good cases for both. In general, I like the idea of a natural primary key, when I see one that it is really natural.
Quite often, though, a natural primary key has proved itself faulty. And most of the times, when such faults happened, it was because of limited understanding of the data. I like to describe this kind of misinformation as
data analysis culture clash.
When choosing a natural primary key, one should consider which element, or group of elements, are unique in a given set. Not only that, they must be immutable, at least within that set.
For example, in a group of people, we may assume that a combination of name, surname, date and place of birth is a good natural primary key. Well, no. It isn't, for several reasons. Read on for a few real cases.
If we rule out the above combination, perhaps we could use the Social Security Number as a valid key? Not really.
The fact is that the above assumptions work well if we consider only people from the same country in current times. If we extend our data definition to include people from different countries, or historical records, then the assumption collapses.
A practical case: The
birth place. This is a fair assumption. In combination with other elements (e.g.: date of birth, name, and surname) it can provide good basis for unique and immutable records. If you consider people in the United States today, you are right. Even in the UK, or Italy, or France. But try applying this method to places with recent political changes due to war or revolutions, and suddenly the name of the town may suddenly change. What was before VictoryBurg in Oppresslandia is now known as Heroes City in Freelandia. And this happens now, in the 21st century. If your set includes historical data, these occurrences may become frightfully frequent.
Speaking of historical times, if you are dealing with really old records, you may want to consider that
dates are less than immutable. The way we count days in most Western countries is called the
Gregorian calendar, which was adopted by a handful of countries in 1582. Dates before October 5, 1582 and after October 14, 1582 are recorded using two different calendars. It is inconvenient, but if you know that you may avoid wrong date calculations. Simple, isn't it? Not really. If your records include data from different countries, you will have to take into account when this calendar
was adopted. Just to give a few examples, England adopted the Gregorian calendar in 1752, Japan in 1873, Russia in 1918, Turkey in 1926.
When dealing with people from different countries, you may be tempted to use
citizenship as an immutable property. That may work, if you consider
citizenship at birth and are prepared to keep the names of not currently existing states. If, instead, you get the citizenship from the employee's passport, you may incur in one or more of the following cases (all happened in practice during my work with an international organization):
- Some employees came to work with the passport they had before the political change, and now that the country has split, they have different passports, depending on which side of the new border they live.
- Some employees have parents from different countries and are entitled to more than one passport. They came to work initially with one passport, and came again a few year later with a different one because of different benefits.
- Some employees started working with a given citizenship, and then they got a different one because of marriage, political asylum, or other legal means.
- Some employees became stateless. The equivalent of NULL in a table field.
Coming to the
Social Security Number, the assumption of uniqueness fails for the same reason that citizenship does. It's even more frequent, because of people working abroad and paying their taxes in a foreign country, so they had two or more SSN or equivalent to show. But this element failed for another reason, i.e. because it is not immutable. It is supposed to be, but in some cases it happens that, due to clerical mistakes, the SSN issued in a given country is wrong, and needs to be changed. It happens in Italy, where your SSN (called "codice fiscale", or fiscal code) must match your name, surname, place, and date of birth. If any of these elements was wrong when the code was generated, the code needs to be done again. It's a painful process that requires a court order, but the result is that the item is not immutable.
Other elements that I have seen used wrongly for primary keys, either standalone or as key components, are:
telephone numbers (they can change, and after being changed they can be assigned to other people),
email addresses (they are unique, but they can easily change, or they can be abandoned when the person changes provider, or company, or both),
gender (it
can change),
surname (it can change, legally, for male and female, depending on country and conditions).
Summing up, a sane amount of skepticism should be used when considering if an element can be used in a primary key. Depending on the environment, the element can be safe or it can become a nightmare when the database grows from a neighborhood business to an international venture.