On the ways of creating a GUID that looks pretty


A customer had what at first appeared to be a question born of curiousity.

Is it possible that a GUID can be generated with all ASCII characters for bytes? In other words, is it possible that a GUID can be generated, and then if you interpret each of the 16 bytes as an ASCII character, the result is a printable string? Let's say for the sake of argument that the printable ASCII characters are U+0020 through U+007E.

Now, one might start studying the various GUID specifications to see whether such as GUID is legal. For example, types 1 and 2 are based on a timestamp and MAC address. An all-ASCII MAC address is legal. The locally-administered bit has value 0x02, and one you set that bit, all the other bits can be freely assigned by the network administrator. But then you might notice the Type Variant field, and the requirement that all new GUIDs must set the top bit, so that takes you out of the printable ASCII region, so bzzzzt, no all-ASCII GUID for you.

But we've fallen into the trap of answering the question instead of solving the problem.

What is the problem that you're trying to solve, where you are wondering about all-ASCII GUIDs?

We want to create some sentinel values in our database, and we figured we could use some all-ASCII GUIDs for convenience.

If you want a sentinel value that is guaranteed to be unique, why not create a GUID?

C:\> uuidgen
GUID_SpecialSentinel = {that GUID}

Now you are guaranteed that the value is unique and will never collide with any other valid GUID.

We could do that, but we figured it'd be handy if those sentinel values spelled out something so they'd be easier to spot in a dump file. If we know that all-ASCII GUIDs are not valid, then we can use all-ASCII GUIDs for our sentinel values.

Now, while uuidgen does produce valid GUIDs, it's also the case that those valid GUIDs aren't particularly easy to remember, nor do they exactly trip off the tongue. After all, the space of values that are easy to pronounce and remember is much, much smaller than 2¹²⁸. It's probably more on the order of 2²⁰, which is not enough bits to ensure global uniqueness. Heck, it's not even enough bits to describe all the pixels on your screen!

So w00t! Since all-ASCII GUIDs are not generatable under the current specification for GUIDs, I can go ahead and name my GUID {6d796152-6e6f-4364-6865-6e526f636b73} which shows up in a memory dump as

52 61 79 6d 6f 6e 64 43-68 65 6e 52 6f 63 6b 73  RaymondChenRocks

I am so awesome!

But even if you convince yourself that no current GUID generation algorithm could create a GUID that collides with your special easy-to-remember and quick-to-pronounce sentinel GUIDs, there is no guarantee that you will make a particularly unique choice of sentinel value.

This is also known as What if two people did this?

There are many people named Raymond Chen in the world. Heck, there are many people named Raymond Chen at Microsoft. (We get each other's mail sometimes.) What if somebody else named Raymond Chen gets this same clever idea and creates their own sentinel value called RaymondChenRocks? Everything works great until my database starts interoperating with the other Raymond Chen's database, and now we have a GUID collision.

Now, the most common way to create a duplicate GUID is to duplicate it. But here, we created a duplicate GUID because the thing we created was not generated via a duplicate-avoidance algorithm. If the algorithm wasn't designed to avoid duplicates, then it's not too surprising that there may be duplicates. I just pulled this GUID out of my butt. (Mind you, my butt rocks.)

Okay, so let's go back to the original problem so we can solve it.

The most straightforward solution is simply to create a standard GUID each time you need a new sentinel value. "Oh, I need a GUID to represent an item which has been discontinued. Let me run uuidgen and hey look, there's a new GUID. I will call it GUID_Discontinued." This solves the uniqueness problem, and it is very simple to explain and prove correct. This is what people end up doing the vast majority of the time, and it's what I recommend.

Okay, you want to have the property that these special GUIDs can be easily spotted in crash dumps. One way to do this is to extract the MAC address from a network card, then destroy the card. You can now use the 60 bits of the timestamp fields to encode your ASCII message.

A related problem is that you want to generate a GUID based on some other identifying information, with the properties that

  • Two items with the same identifying information should have the same GUID.

  • Two items with different identifying information should have different GUIDs.

  • None of these GUIDs should collide with GUIDs generated by any other means.

For that, you can use a name-based GUID generation algorithm.

Comments (39)
  1. Joshua says:

    So we faced a problem like this with the number of sentinel guids very large. Thankfully we can specify all the sources so we used another algorithm not colliding with 1 or 4.

    We've seen all-ASCII before: gpt

  2. Random User 993175 says:

    "Extract the MAC address from a network card, then destroy the card," then hope it isn't one of the (thankfully exceedingly rare) cards where the manufacturer messed up an assigned duplicate MACs.

  3. Peter says:

    I've generated "memorable" GUIDs by simply running a guid program hundreds of thousands of times, filtering out the ones that didn't start with a pre-determined set of characters, and the eyeballing the remaining choices.

  4. 12BitSlab says:

    Hey!  That's a good looking GUID you have there.  Be a shame if something happened to it……

  5. Skyborne says:

    "not generatable under the current specification for GUIDs"

    This is just waiting to become a compatibility problem when the specs are updated.

    Reminds me of when someone rejected a proposal to add a version byte to a data format, should we need to change it again.  "We're never going to change this again."  Yes, that is apparently what everyone thought the last time it changed, and here we are again….  IIRC, they didn't even leave a "save old version" option available.

  6. Adam Rosenfield says:

    Globally unique, human-memorable, and non-centrally distributed.  Pick two.

    GUIDs are globally unique and non-centrally distributed, but not human-memorable.

    Human names are human-memorable and non-centrally distributed, but not globally unique.

    Names from a central authoritative store (like an email address or username) are globally unique and human-memorable, but not non-centrally distributed.

  7. Rich M says:

    @Peter. What a waste of several hundred thousand perfectly good GUIDs!

  8. abelenky says:

    I once worked on a project where the devs liked to hand-generate their own vanity GUIDs.

    The project was code-named BobSled, and most of the project guids started with "leet-speek" for Bobsled, like:

    {B0B57ED0-aaaa-bbbb-cccc-1234567890ab}

  9. mark says:

    The designers of the various Guid formats seem to have had little understanding of the fact that a collision of two randomly generated Guids is incredibly unlikely. A collision is as likely as winning the lottery multiple times in a row. All structure in Guids results in no improvement under any circumstances. Just make it 16 random bytes.

    Also, no system that I know of makes use of the structure of Guids. It's just 16 opaque bytes to all systems that I know of.

  10. A Nony Moose says:

    I'd hate to be the intern that gets one of your emails….

  11. Myria says:

    "It's probably more on the order of 2²⁰, which is not enough bits to ensure global uniqueness. Heck, it's not even enough bits to describe all the pixels on your screen!"

    Neither is 2^128.

  12. Simon Farnsworth says:

    @Myria

    It depends how you interpret "not enough bits to describe all the pixels on your screen". I'm writing this on a low-end laptop, with a 1366×768 screen. That means I have 2^20 + 512 pixels on screen; 2^20 is thus not enough bits to uniquely name each pixel position on my screen.

    2 ^ 128, however, is enough bits to uniquely name every pixel on your screen, even if you run at 4096×2304 (digital cinema 4K resolution).

  13. 12BitSlab says:

    Thanks for the correction and the link!

  14. Henri Hein says:

    @mark:

    GUIDs are generated and compared many orders of magnitude more often than lotteries are being played.

  15. Henri Hein says:

    Also, generating enough entropy for 16 bytes of randomness strikes me as computationally a lot harder then basing the result on NIC and date/time.

  16. Jim Lyon says:

    Going back to the original problem, GUID_NULL (all zeros) makes a wonderful sentinel. It's guaranteed not to be a valid GUID.

    [ZOMG! This crash dump is filled with our sentinel value! -Raymond]
  17. ChuckOp says:

    Possibly another question, but similar topic; how did the Office team get away all those "0FF1CE" GUIDs?  I think they are IID's in the registry.

  18. mark says:

    "Random" has unpredictable collsions. To what is the PRNG initialized?

    The original concept was simple: you want a number that is unique in time and space. Very well: there is a time component (clock ticks since some epoch) and a space component (using a common piece of equipment, the hardware address(*) of the NIC, as a proxy for space).

    There's a couple of wrinkles in that, based on handling clocks being set backwards and how quickly you can move a NIC from one machine to another, but on the whole, it's a complete solution.  Until people start to worry that it is also a tracking identifier.

    (*) the one in the ROM, as distinct from the address the NIC is using right now, which is software-settable.

  19. Joshua says:

    [ZOMG! This crash dump is filled with our sentinel value! -Raymond]

    lol. That just made the site worth it for today.

  20. 12BitSlab says:

    2**128 is enough bits to have a unique address for every atom in the known universe — not accounting for dark matter.

  21. @128BitSlab says:

    Not quite, 2**128 is only about 10**41 (give or take).  The current estimate of atoms in the (observable) universe is around 10**80, so we're not safe until we have 256 bit GUIDs.

    Numbers come from http://www.universetoday.com/…/atoms-in-the-universe so may have changed in the last few years.

  22. Anonymous Coward says:

    @ChuckOp

    By choosing them manually, because nothing technically prevented them from doing so. It's not like there are GUID Police who will strike vengeance upon those who use non-randomly-generated GUIDs.

    @mark

    I think the idea is that, with one of the non-random uniqueness algorithms, you have *zero* chance of duplication on this planet, unless someone screws up the generation process (say, by setting their timestamp backwards). Not 2^-128, but 0.

    [I suspect the odds of somebody screwing up the generation process is greater than 1 in 2^128. -Raymond]
  23. Ken Hagan says:

    @12bitslab: You don't need very many bits to label every particle in the universe. About 5 would probably suffice. According to the rules of QM, all particles of the same type are intrinsically indistinguishable. Not only is it futile to try to tell them apart, it actually gives the wrong answers if you try.

  24. There are 2^128 ≈ 3.4e38 possible values for a GUID (ignoring namespace restrictions.) We can expect a duplicate on the order of every 2e19 GUIDs, give or take a factor of π.

    There are 95 characters from U+20 to U+7E inclusive, which makes for 95^16 ≈ 4.4e31 "printable GUIDs". We can expect a duplicate on the order of every 7e15 "printable GUIDs".

    So printability sacrifices *some* uniqueness, but is probably still good enough for most applications.

  25. boogaloo says:

    Get one before they all run out http://wasteaguid.info/

    "Globally unique, human-memorable, and non-centrally distributed.  Pick two."

    Most people don't need globally unique, it just needs to be unique among their database which in most cases is sitting on a single machine. All they have to worry about is that two people aren't generating sentinels. Centrally distributing sentinels within your organisation should be pretty trivial. Within a single organisation there are many problems that can be caused if you allow two people to do things without managing it. Unmanaged employees are more likely to produce "oh yeah I've been using the tax table for logging holiday requests because it was easier than creating a table, the down side is we now owe a load of tax and will get a complete audit" than a duplicate guid.

  26. cheong00 says:

    Fortunately when you add values to unique value, the result must be also unique.

    I would probably tell them to prepend, say… DEADBEEF, before the generated GUID value, so when they assess the dump and see DEADBEEF, they could check whether the latter values are valid GUID to know whether it's their sentinel values or not. I believe it'll be easier for them this way.

  27. boogaloo says:

    @cheong00 A Guid is fixed size, you can't just add something to it without removing something else. You're thinking about a variable length string which has less restrictions and wouldn't have required the question to be asked.

  28. Brian_EE says:

    @boogaloo: A GUID + fixed prefix is also a fixed size. Size = sizeof(GUID) + sizeof(prefix). And if you're using it as a sentinel value (in a database for instance) you just make the field the larger size. Nothing says the table column has to be only 16 bytes.

  29. morlamweb says:

    @boogaloo: more people need "globally-unique" identifiers than they realize.  I do tech support for an RDBMS-based document-management system.  The internal identifiers for the document are sequential numbers based on the date at the time of the document creation; for instance, 2014123100000571.  Those numbers are OK for uniquely identifying a document within one database, but what happens when you want to merge the contents of two database?  such as when to companies, both of which happen to use my application, merge and want to use a combined system?  Now you have all sorts of internal (not shown to the users) identifiers in the database tables that cannot be merged due to uniqueness conflicts.  If they had used GUIDs from the start for the internal IDs (which they did for other identifiers in the system…) then database merging would have been a simple process.  As it stands we have to make do with a number of workarounds, none of which are entirely satisfactory for my customers: build a new database for all users and mark the old databases as read-only/legacy; export/import database records from one system to another using their buggy database tool.  That import/export tool, in fact, was modified to work around the identifier limitation specifically; I can see notes about it in the change log : )

  30. boogaloo says:

    @brian_ee: You've just changed the field type from a Guid to a string, which is like going into a shop to buy a wrench for your car and being sold a new car instead.

    "A GUID + fixed prefix is also a fixed size. Size = sizeof(GUID) + sizeof(prefix). And if you're using it as a sentinel value (in a database for instance) you just make the field the larger size. Nothing says the table column has to be only 16 bytes"

    @morlamweb: I would imagine as the developer of the system then you assigned the sentinels and they are both the same in both databases because they mean the same thing. If the two companies assigned them then they are likely to be different. If they picked the same one and want them to be different then you remap them when importing and charge them for your time. If you're talking about randomly assigned Guid's for things like Invoices etc, then you might want to sanity check them when importing that they don't exist already (i.e. don't do an "UPSERT").

    For your example where you're generating numbers that are human readable and have a high chance of collision then you probably want to have a "company" field that is also part of your key so that old records can share the same number. You can't rekey it as your customers customers will have problems when referring to invoices etc (or whatever your documents are). But they'll know which company they dealt with, so you can deal with it that way.

    "such as when to companies, both of which happen to use my application, merge and want to use a combined system?"

  31. Wear says:

    @Rich M Whenever I'm testing and making a lot of GUIDs there's always a small voice in my head saying "Oh no, you're wasting GUIDs". Then I feel bad for all those GUIDs that will never get used for anything important. Just sitting there representing an employee named "123 123".

  32. cheong00 says:

    @boogaloo: It's perfectly okay to store it as original GUID in database, and that modify it with prefix/suffix when using it as sentinel value.

    I can't believe I had to write this (TM). (/joke)

  33. Tyler says:

    Noticed a few dead links to RFC 4122 on the linked post "How can I generate a consistent but unique value that can coexist with GUIDs?" – It seems a stable URL is http://www.ietf.org/…/rfc4122.txt

  34. Cube 8 says:

    uuidgen .com : Account Suspended

  35. boogaloo says:

    @cheong00 The original question was: "We want to create some sentinel values in our database" and your answer doesn't allow the sentinel to be stored in the database, therefore I don't believe it is perfectly okay.

    "It's perfectly okay to store it as original GUID in database, and that modify it with prefix/suffix when using it as sentinel value.

    I can't believe I had to write this (TM). (/joke)"

  36. morlamweb says:

    @boogaloo: I think you completely misunderstood my comment.  I wasn't writing about sentinel values in a database; I was responding to your comment and the potential short-sightedness of using non-globally-unique values in a database.  To quote you: "Most people don't need globally unique, it just needs to be unique among their database …".  I would argue that most people DO need globally-unique identifiers in a database, even if it's just a "test" system or somesuch.  I know of about 10 customer production systems of my application that scaled up from "pilot" projects.  They were initially scoped for ~15 users for ~6 months, and then they end up running 3 years later with over 50 users.  I use this to illustrate my point: you just never know how something will end up being used, even if you have a say in the design of the system.  To be clear, I'm not a developer of this system; I thought it was clear when I stated "I do tech support" as my primary role.

    In short, if you have the choice to go with globally-unique or locally-unique identifiers in your system (assuming that it's in the design stage, or at a point where it can be changed), then do those of us in support a favor and go with globally-unique IDs.  People who come after you who need to push the system into areas that the original design did not call for will thank you.

  37. cheong00 says:

    "The original question was: "We want to create some sentinel values in our database" and your answer doesn't allow the sentinel to be stored in the database, therefore I don't believe it is perfectly okay."

    Try create a MSSQL with 5 fields, and then a char(8) field + uniqueidentifier field. Add 2 record with "DEADBEEF" on char(8) field and a GUID on uniqueidentifier field. Now shutdown the database and try to open the MDF file with binary editor, and observe.

    Of course by database it could mean other database, but I think most database that doesn't do data compression should behave similarly. (And for those which do data compression, you can't see the sentinel value conveniently anyway)

  38. cheong00 says:

    Oh, I have forgotten what I thought yesterday, so a re-try…

    By "create some sentinel values in our database", I believe they're implementing their own database. Since modern database systems rearrange field storage order to make more efficient use of space, adding sentinel values as fields will almost always prove to be useless later. (Your table cannot contain any variable-length fields, and you have to carefully plan the table structure so the fields agrees with memory alignment.)

    Because of this, adding prefix to GUID as sentinel values shouldn't matter.

Comments are closed.

Skip to main content