GUID Guide, part one

What is a GUID? The acronym stands for “globally unique identifier”; GUIDs are also called UUIDs, which stands for “universally unique identifier”. (It is unclear to me why we need two nigh-identical names for the same thing, but there you have it.) A GUID is essentially a 128 bit integer, and when written in its human-readable form, is written in hexadecimal in the pattern {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx}.

The purpose of a GUID is, as the name implies, to uniquely identify something, so that we can refer to that thing by its identifier and have confidence that everyone can agree upon what thing we are referring to. Think about this problem as it applies to, say, books. It is cumbersome to refer to a book by quoting it in its entirety every time you mention it. Instead, we give every book an identifier in the form of its title. The problem with using a title as an identifier is that there may be many different books with the same title. I have three different books all entitled “The C# Programming Language” on my desk right now; if I want to refer to one of them in particular, I’d typically have to give the edition number. But there is nothing (apart from their good sense) stopping some entirely different publisher from also publishing a book called “The C# Programming Language, fourth edition” that differs from the others.

Publishers have solved this problem by creating a globally unique identifier for each book called the International Standard Book Number, or ISBN. This is the 13-decimal-digit bar coded number you see on pretty much every book(*). How do publishers manage to get a unique number for each of the millions of books published? They divide and conquer; the digits of an ISBN each have a different meaning. Each country has been assigned a certain range of ISBN numbers that they can allocate; governments then further allocate subsets of their numbers to publishers. Publishers then decide for themselves how to assign the remaining digits to each book. The ISBNs for my three editions of the C# spec are 978-0-321-15491-6, 978-0-321-56299-9 and 978-0-321-74176-9. You’ll notice that the first seven digits are exactly the same for each; they identify that this is a publishing industry code (978), that the book was published in a primarily English-speaking region (0), by Addison-Wesley (321). The next five digits are Addison-Wesley’s choice, and the final digit is a checksum. If I wish to uniquely identify the fourth edition of the C# specification I need not state the ambiguous title at all; I can simply refer you to book number 978-0-321-74176-9, and everyone in the world can determine precisely which book I’m talking about.

An important and easily overlooked characteristic of the ISBN uniqueness system is that it only works if everyone who uses it is non-hostile. If a rogue publisher decides to deliberately publish books with the ISBN numbers of existing books so as to create confusion then the usefulness of the identifier is compromised because it no longer uniquely identifies a book. ISBN numbers are not a security system, and neither are GUIDs; ISBN numbers and GUIDs  prevent accidental collisions. Similarly, traffic lights only prevent accidental collisions if everyone agrees to follow the rules of traffic lights; if anyone decides to go when the light is red then collisions might no longer be avoided, and if someone is attempting to deliberately cause a collision then traffic lights cannot stop them.

The ISBN system has the nice property that you can “decode” an ISBN and learn something about the book just from its number. But it has the enormous down side that it is extraordinarily expensive to administer. There has to be international agreement on the general form of the identifier and on what the industry and language codes mean. In any given country there must be some organization (either a government body or private companies contracted by the government) to assign numbers to publishers. It can cost hundreds of dollars to obtain a unique ISBN.

GUIDs do not have this cost problem; GUIDs are free and there is no requirement that any governing body get involved to ensure their uniqueness. A GUID is a number that you can generate yourself and be guaranteed that no one else in the world will generate that same number. That seems a bit magical. How does that work? Over the next couple of episodes we’ll take a look at how that magical property is achieved.

(*) The attentive reader will note that there are usually two bar codes on a book in the United States. The first one is the ISBN; the second bar code is the number 5 followed by a four digit number that is the publisher’s suggested price of the book in American pennies.

Comments (38)

  1. DaTribe says:

    An interesting choice of topic, but something well work explaining. Great post and looking forward to the others in the series.

  2. Olivier says:

    > It is unclear to me why we need two nigh-identical names for the same thing

    This reminds me of URI and URL.

  3. Otaku says:

    Seems like a poor analogy to me.  I would expect a GUID to identity a particular instance of a book, not a particular title.  Perhaps an SSN would be a better analogy?

  4. Random832 says:

    "GUIDs are free and there is no requirement that any governing body get involved to ensure their uniqueness." Who's in charge of MAC addresses? (As for type 4 GUIDs – well, they're not actually unique, just statistically unlikely to collide).

  5. Superbeard says:

    Looking forward to reading this series.

  6. Falanwe says:

    >> It is unclear to me why we need two nigh-identical names for the same thing

    >This reminds me of URI and URL.

    Except URIs and URLs are not the same thing. URLs are a subset of URIs. And incidentally, ISBN numbers are URIs too ^^

  7. Anonymous Coward says:

    >Seems like a poor analogy to me.  I would expect a GUID to identity a particular instance of a book, not a particular title.

    A class ID is a GUID that refers to a particular title, rather than a specific book, no? In fact most objects don't have their own GUIDs, but COM classes and interfaces do.

  8. Chris B says:

    @AC, Otaku

    The thing being identified depends on context.  COM uses GUIDs to distinguish types because that is what needs to be identified. OTOH, rows in databases often use GUIDs as primary keys, and that can translate into an instance of an object being identified by that GUID.

  9. Tergiver says:

    It is my understanding that UUID (universal) was considered presumptuous. It's possible for an alien on another planet to create a UUID that is identical to one created on Earth. So purists would use the term GUID (global) instead as the number is only guaranteed to be globally unique, not universally unique.

  10. Michael Starberg says:

    The subtle difference is that GUIDs are made by gnomes while UUIDs ar made by unicorns.

  11. Jacob says:


    I don't believe that's the case at all. GUIDs are 'more unique' than UUIDs in general, because the UUID standard defines identifier types that are relatively likely to conflict when used by independent programs (the types based on hashes). See:…/UUID

    I think the 'universe' in UUID refers to the logical sense. I.e., 'this identifies something uniquely in the universe of all the things I'm considering'. GUIDs on the other hand are supposed to give you a truly unique number every time one is generated, irrespective of the content it identifies.

    So if you want an identifier that uniquely identifies an object among the set of objects your program does or might work with, that's a UUID. If you want an identifier that uniquely identifies an object among the set of absolutely all possible objects, that's a GUID. In practice it's safer (and just as simple) to just use random UUIDs over hash-based UUIDs, so most UUIDs happen to be GUIDs. But logically, that is the terminology distinction (as I see it).

  12. Tergiver says:

    @Jacob: I'm sure you're correct, but I like my answer better. It wasn't an original construct. I can no longer remember where I heard it.

  13. BW says:

    I remember an instance where there was a GUID collision with Microsoft products. One of the windows 95 powertoys (Shortcut Target Menu i think) had it's GUID reused in later versions of windows by Show Desktop (or was it Send to Desktop?). I wasn't pleased, it is one of the few powertoys that is still relevant and non-trivial to duplicated.

  14. Andrew Ducker says:

    I am slightly surprised the bookseller only gets five digits.  I guess 99,999 books is a lot, but I kinda assumed there were some publishers out there with more than that, and also more than 999 publishers in the US, over time.

    Your surprise is due to your unwarranted assumptions; I did not say that (1) the zero code is the only code for the United States, (2) that every publisher has a three-digit identifier, or that (3) every publisher is only allowed to own one publisher code. None of those assumptions are true. English-speaking regions use both zero and one as the region code. Large publishers can purchase one or more three-digit codes; smaller publishers can purchase longer codes that give them fewer digits to choose themselves. And when we run out of those, they'll start using another three-digit prefix; there are hundreds of those still unused. — Eric

  15. Harold says:

    Wait, GUIDs are free!? I may have been taken advantage of on Ebay.

  16. cheong00 says:

    @Random832: As always, answer to common factal questions can be found on wiki.…/MAC_address

    The MAC address namespace is managed by IEEE.

  17. Ken says:

    "The first one is the ISBN; the second bar code is the number 5 followed by a four digit number that is the publisher's suggested price of the book in American pennies."

    Neato!  Hmm, one of the first books I checked has "00112" as the second number.  I wonder what that means.

    That would be the suggested retail price in British pence. — Eric

  18. Stain says:

    Looking forward to "ISBN Guide, part two"

  19. ficedula says:

    @Andrew Ducker:  The publisher & item number parts of an ISBN are variable length; in English speaking countries publisher + item no. always account for 8 digits, but that can be made up of a 3-digit publisher code + 5 digit item code, 6 digit publisher code + 2 digit item code, etc. The leading digits indicate what sort of range you're dealing with.

    So you can allocate a 10,000 item number range to a reasonably sized published, or a 100,000 item number range to a larger one, or just 100 items (or 10!) to small publishers. I don't know for sure, but I assume that a publisher who uses up all their numbers can have a second range allocated to them.

  20. Me says:

    Microsoft runs into duplicate GUIDs because the think the rules do not apply to them and so they may make up GUIDs manually (in some kind of sequence). This makes them more recognizable of course but you get the risk of duplicates.

  21. carlos says:

    Me wrote "Microsoft […] make up GUIDs manually (in some kind of sequence)."

    They don't make them up manually.  The old command-line tool uuidgen used to produce perfectly valid and sequential UUIDs if you asked it to produce more than one.  However, UUIDs are randomized now, so it no longer behaves like this.

    Years ago I read somewhere (probably MSJ) that if you were registering lots of COM objects and interfaces and so on that Windows worked more efficiently with sequential UUIDs.  Perhaps because they were more likely to be located near each other in the registry backing files.

    The problem with this is that having generated the UUIDs, you need a process to allocate them.  I guess this broke down.

  22. Interesting start to the series…

  23. TejasJ says:

    Very interesting! Waiting for more…..

  24. Paresh says:

    I think in next parts, I'm going to get answers to the questions I had since long time.

  25. SLaks says:

    See also Waste-A-GUID

    (The fallacy of this site is left as an exercise to the reader)

  26. says:

    Great post, looking forward to the next one!

    I just wanted to point out that GUID are not exactly free. It comes with the price of 16 Bytes each, which probably is not something to consider, but sometimes makes a difference. Using ints for key for example will be much more memory efficient (but has other big disadvantages).

    Best Regards!

  27. Random832 says:

    Of course, the real difference with GUIDs even with the MAC address requirement is, it's easy to buy a MAC address, which you can then use to generate billions of billions of GUIDs without having to further answer to anyone. In principle, something like an ISBN could be made to work the same way, and that it doesn't is an artifact of how the publishing industry works.

  28. Henry Skoglund says:

    Maybe you could also mention why still today so many of the GUIDs in the registry ends with -444553540000

    (eg. {25E609E4-B259-11CF-BFC7-444553540000}).

    Have patience; this is only part one! — Eric

  29. CodeInChaos says:

    You also forgot to mention that `Guid.NewGuid` is the finest cryptographic random number generator available in .net 😛

  30. GUID wasting on a massive scale. says:

    If you want a random record from an SQL Server table, you need to use TOP 1 and ORDER BY NEWID(). (RAND returns the same value for each record in the select.) Does this mean I generate and waste thousands of GUIDs every time? There's only a finite number of them available.

  31. Phoog says:

    I note that the name of the US currency subdivision is "cent", not "penny".  The name "penny" is commonly used for the coin, but its proper name is "cent".  Interestingly, the American currency system was designed with three units, by analogy with pounds, shillings, and pence; these units are dollars, dimes, and cents.  This distinction lived on for some time, at least theoretically: I've seen an invoice form from the 19th century with columns for dollars, dimes, and cents.  This also explains why the US dime says "one dime" rather than "ten cents".

  32. Gabe says:

    Phoog: Actually, the American currency system was designed with four units: dollars, dismes, cents, and milles, according to the Coinage Act of 1792. Note that the 's' was soon dropped from 'disme'. No "disme" coin was ever minted, but there was a "half disme" for a while. The first 10-cent coins had no denomination printed on them (they were silver and silver coins were not required to show their denomination), and later ones had the denomination printed as "one dime".

    I don't believe any coin was ever minted in milles, and in fact the only place the denomination is commonly used is for taxes. There was a 5 mil coin, but its denomination was "half cent".

  33. carlos says:

    @Gabe: I was in the USA a couple of weeks ago and, unfamiliar with the coins, grumbled to my brother that the dime must be the only coin in the world that doesn't state how much it's worth (i.e. ten cents).

    I didn't realise it was a currency unit, but now I do.  Cheers!

  34. Birthday paradox says:

    I had a colleague who was concerned about GUID collisions, which prompted me to write this explanation:…/birthday-paradox-and-guid-collisions.html

    The punchline is that you can assign about a million billion GUIDs *before* you hit a one in a billion chance of a collision.

  35. Jürg Steffen says:

    What about the CDDB ( I think they have also thought that the indentifier of a Music CD should be unique. But I found also collisions there!

    But anyway, good article!

  36. Static Shock says:

    I'm very worried that at the rate we are using them we will run out of GUID's and everything after that will have to be non-unique  😉  😀

  37. Jimi says:

    So GUIDs are created by gnomes are they?  Well that will explain all those tiny footprints around our database server.

  38. efflux says:

    (*) The attentive reader will note that there are usually two bar codes on a book in the United States. The first one is the ISBN; the second bar code is the number 5 followed by a four digit number that is the publisher's suggested price of the book in American pennies.

    In another incarnation back when I was in univeristy, I worked at the university's bookstore and I remember there being two pricing schemes publishers used.  One was "list price" where there would a suggested retail price for the book and a set discount passed to the store which was often upwards of 40%.  With this pricing, the second barcode would indeed reflect the suggested retail price.  The second pricing was "net" pricing, which did not carry a suggested retail price at all.  These tended to either be all zeroes or have a number beginning with "9" and seeming random information.  Having just read this post prompted me to finally look this up and I see that prefix indicates "internal use".  Nice to have that mystery solved 🙂

    For what it's worth, the "net" priced books tended to be "textbooks" whereas list priced books typically fell under the "trade book" category, which while sometimes required reading for courses were of the type to also be in general circulation.