Surrogate keys

After you have been building databases for a while, surrogate keys become second nature but many people do not know about them so I thought I should devote some time to them. In addition, a constant challenge is choosing between using an int or a uniqueidentifier (i.e. GUID) as a surrogate key and I thought this would be a great forum to get some feedback on when you should use one or the other.

Surrogate keys

One of the more useful database design strategies is to use surrogate keys. The idea is that instead of using, say, a social security number as an ID for people you use a completely artificial key (usually an int , bigint or a uniqueidentifier) called a surrogate key. In the database application, you may display and search on the natural key (social security number in this case) but all joins use the surrogate key.


There are a number of reasons for using surrogate keys but the main reason is to deal with change. Natural keys like social security numbers often change. If you do not plan carefully you may assume that, say, a person’s social security number will never change. Then reality sets in—a person’s social security number can change. Suddenly, you have to propagate that change to everywhere that you refer to a person. What happens when we run out of social security numbers and need to change the format? Suddenly, you have to convert a large amount of data and resize many rows. What happens when your company expands overseas where they have their own national ID system? Suddenly you have to rewrite a lot of code. However, if you had used a surrogate key instead then each of these scenarios would be much easier to deal with.


The other common reasons for using surrogate keys are to do with efficiency. In general, surrogate keys are significantly smaller than natural keys. Thus table rows and indexes will be smaller and comparisons quicker (although bookmarks complicate the picture). Since more data is in each disk page, table scans, index lookups and index scans will all go faster. Similarly, surrogate keys replicate faster and reduce backup times.


However, there are disadvantages to surrogate keys, which some people think outweigh their advantages. Surrogate keys can lead to more joins since a join is required back to a master table to get the natural key for a report. Surrogate keys make ad hoc queries harder to write since more joins are required and users must have a deeper understanding of the data. In addition, surrogate keys require the use of an extra uniqueness constraint instead of just the primary key constraint with an associated loss of efficiency.


I tend to favor the surrogate key approach unless there is a natural key that I think will never change and the key is a reasonable size. However, even in the space of representing detailed information about compiled programs, good natural keys are rare.

Should a surrogate key be an int or uniqueidentifier?

I still struggle with this issue. In a naïve way, an int or a bigint is the obvious choice at 4 or 8 bytes instead of 16 bytes with some nice support for table partitioning in SQL Server 2005. Also, integer based keys can be added sequentially and table appends are typically faster than table inserts. However, replication and database merging complicate the issue.


If there is no row GUID then merge replication will add one. Thus, if you are going to pay the cost anyway it is tempting to use a uniqueidentifier as the key. The disadvantage is that the row size of every table the ID occurs in will increase.


Related to the merge replication issue is the issue of generating keys in the middle or client tier. This is easier with a uniqueidentifier based key than an integer based key. There are numerous strategies for assigning integer key ranges but they complicate the logic and make the system less robust.


I tend to favor using bigint based keys unless I am sure the number of rows will be low. Although it hurts every time I do it. The databases I am working on may contain an unusually large number of rows so int based columns should be sufficient for most people.

Your thoughts please

Does all this match up with your experience? I would be interested in hearing people’s views on surrogate keys and in particular, the int or uniqueidentifier issue.


This posting is provided “AS IS” with no warranties, and confers no rights.


Comments (67)

  1. Anonymous says:

    My preference is the use of integer/bigint keys, but I’m mainly working on CMS and web systems without replication.

    Experience has shown me that clients get really pissed off (if they know what they want) when developers use guids for keys and these end up on URLs. They often wrap in emails and don’t look good on paper. When asked, the developers of these systems often think they have to use guids to be ‘enterprise’ developers, they can provide reasoning like that in your post.

    I guess you could (should) shield the user from these keys, but for Mort’s out there this is something that they are unlikely to invest time in.

  2. Anonymous says:

    I’ll preface this by saying "I ain’t no expert or nothin’," but I always use surrogate keys with one exception: a table with only two columns each of which is a foreign key to a surrogate in another table.

    For example, say the first table is A and its primary key is A_ID (identity 1,1). The second table is B and its primary key is B_ID (identity 1,1). The third table is C and its sole purpose is to associate rows in A to rows in B, each A may be associated to only one B but many A’s may be associate to a each B.

    The two columns in table C are A_ID and B_ID and the primary key I’ll use is the concatenation of the two.

    Is this correct?

    Honestly, I don’t know. It seems reasonable because I know A_ID and B_ID will never change and a unique constraint is required on the combination even if were to use a surrogate.

    What I can’t quite seem to noodle out is why this works, specifically, suppose A_ID has a value = "123" and B_ID has a value of "45," presumably the concatenated value for the primary key is "12345." But if a new row is added where A_ID is "12" and B_ID is "345" (concatenation = "12345") no primary key constraint violation is thrown.

    On the surface, those two numbers seem the same to me, but I suspect my trusty SQL friend views the value "123" for A_ID as something other than "123," like a_id.123, therefore the two concatenated primary keys are not "12345" (and therefore identical) but something akin to "a_id.123b_id.45" and "a_id.12b_id.345", or maybe it’s more like they’re delimited such as "123.45" and "12.345" or some sort of crap.

    Honestly, I don’t know, do you? If you do, I wish you’d tell me.

    All I know is it works and I can’t for the life of me figure out why it’d be better to use a surrogate on table C in the scenario I described above.

  3. Anonymous says:

    In the case described above.

    Why dont use a surrogate key (SID) in a relation table with two foreigns keys to another’s tables SID?

    You can make a good use of the uniqness of that relation elements to provide functionality over it, like update, delete. Inserts are equally implemented since SID are automatically generated.

    I support the use of SID since it gives a object ref-style to table tuples allowing automatic tools to generate functionalities that in other case with must program ad-hoc.

    I am actually working on a model based in SIDs in wich you could even perform replication schemes based on info stored on such SID.

  4. Anonymous says:

    I have been tempted to use the uniqueidentifier but shy away when I realise that queries off of ints and bigints are much more friendlier than uniqueidentifiers. with the right columns selected in queries it should be the way to go about. Its sad when you are forced to use merge replication in some cases, it becomes difficult managing the same codebase in different environments which may not have replication turned on

  5. Anonymous says:


  6. Anonymous says:


  7. Anonymous says:


  8. Anonymous says:


  9. Anonymous says:

    セレブ達は一般の人達とは接する機会もなく、その出会う唯一の場所が「逆援助倶楽部」です。 男性はお金、女性はSEXを要求する場合が多いようです。これは女性に圧倒的な財力があるから成り立つことの出来る関係ではないでしょうか?

  10. Anonymous says:


  11. Anonymous says:


  12. Anonymous says:


  13. Anonymous says:


  14. Anonymous says:


  15. Anonymous says:


  16. Anonymous says:

    何回かメールして会える人一緒に楽しいことしょ?お給料もらったばかりだからご飯くらいならごちそうしちゃうょ♪ とりあえずメールくださぃ★

  17. Anonymous says:


  18. Anonymous says:


  19. Anonymous says:


  20. Anonymous says:


  21. Anonymous says:


  22. Anonymous says:


  23. Anonymous says:


  24. Anonymous says:


  25. Anonymous says:


  26. Anonymous says:


  27. Anonymous says:


  28. Anonymous says:


  29. Anonymous says:

    熟女だって性欲がある、貴方がもし人妻とSEXしてお金を稼ぎたいのなら、一度人妻ワイフをご利用ください。当サイトには全国各地からお金持ちのセレブたちが集まっています。女性から男性への報酬は、 最低15万円からと決めております。興味のある方は一度当サイト案内をご覧ください

  30. Anonymous says:


  31. Anonymous says:


  32. Anonymous says:


  33. Anonymous says:


  34. Anonymous says:


  35. Anonymous says:


  36. Anonymous says:


  37. Anonymous says:

    最近してないし欲求不満です。一緒にいやらしいことしませんか?エッチには自信あるよ(笑) メール待ってるよ☆

  38. Anonymous says:


  39. Anonymous says:


  40. Anonymous says:


  41. Anonymous says:


  42. Anonymous says:


  43. Anonymous says:

    誰か満足させてくれる人いませんか?めんどくさいこと抜きでしよっ♪ とりあえずメールして☆

  44. Anonymous says:


  45. Anonymous says:


  46. Anonymous says:


  47. Anonymous says:


  48. Anonymous says:


  49. Anonymous says:


  50. Anonymous says:


  51. Anonymous says:


  52. Anonymous says:


  53. Anonymous says:


  54. Anonymous says:


  55. Anonymous says:


  56. Anonymous says:


  57. Anonymous says:


  58. Anonymous says:


  59. Anonymous says:


  60. Anonymous says:


  61. Anonymous says:


  62. Anonymous says:


  63. Anonymous says:


  64. Anonymous says:


  65. Anonymous says:


  66. Anonymous says:


  67. Anonymous says:

    Surrogate key advantages are over inflated. Often designers blindly go for surrogate keys when good natural key is already available.