Help me optimize this code which enumerates all possible GUIDs


Hi, I'm looking for help optimizing this code. It creates a file containing every possible GUID.

class Program {
 public static void Main(string[] args)
 {
  using (var w = new System.IO.StreamWriter("all_guids.txt")) {
   for (long l1 = 0; l1 <= 0xFFFFFFFFL; l1++)
   for (long l2 = 0; l2 <= 0xFFFFFFFFL; l2++)
   for (long l3 = 0; l3 <= 0xFFFFFFFFL; l3++)
   for (long l4 = 0; l4 <= 0xFFFFFFFFL; l4++)
    System.Console.WriteLine(
     new System.Guid(string.Format("{0:X8}{1:X8}{2:X8}{3:X8}",
                                   l1, l2, l3, l4)));
  }
 }
}

I know this code will take a really long time to run, so any performance improvements would be appreciated.

Okay, wait a second. You do realize that even if you had a 100GHz machine that could generate a new GUID every cycle, it will still take 2¹²⁸ ÷ 10¹¹ = 3 × 10²⁷ seconds or 10¹⁹ years to complete.

Right, that's why I was looking for help making the code run faster.

Look at this way: Suppose you could somehow get this algorithm to run a quintillion times faster, so it finishes in under a year. Your output file is going to be 2¹²⁸ × 16 = 2¹³² bytes in size. That's around 10²⁷ terabytes. One terabyte of SSD storage weighs around 100 grams. The mass of the earth is 10²⁴ kilograms. Therefore, before you run this program, you will need to acquire 100 earth-sized planets and convert them all to SSDs.

What are you trying to do with all of these GUIDs?

There's a Web site that uses GUIDs to identify resources. I want to enumerate all the GUIDs so I can request each one from the Web site to see if it contains anything interesting.

Okay, so it doesn't matter how fast your GUID enumeration algorithm is, because that's not the rate-limiting factor in this entire exercise. The limiting factor is the speed of the Web server you are attacking. (And at this point, it's fair to say that what you're doing is attacking a Web server.) Even if the Web server can handle a billion hits per second, it will take 10²² years for it to respond to all your requests.

Looking at it another way: Suppose you could enlist the resources of every computer on the Internet to send a million requests per second. It would still take over 10¹³ centuries for them to try every GUID.

And the server administrators might suspect something is up after, oh, about seven centuries.

This problem requires a sociological, not engineering, solution. You need to contact the site administrators, describe what information you would like, and see if you can come to some sort of data-sharing agreement.

I've contacted the site owners, but they are not interested in helping me, so I have no choice but to try every GUID.

Your computer is already plenty powerful enough to generate GUIDs as fast as the Web site can handle inbound requests. Maybe you can upgrade your cable modem to a higher level of service.

Actually, what you need to do is upgrade the other site's hardware and Internet connection to a higher level of service. Because they are the bottleneck, not you. "Here's a half million dollars to get faster servers. Just a gift from me. No really, no strings attached. Have fun. Just get those faster servers online right away, okay?"

I wish you good luck.

Comments (64)
  1. CrazyDave says:

    OK, so wouldn't it be faster and more economical to just use the $5 wrench method: http://xkcd.com/538/

  2. xor88 says:

    This reminds me of people sternly pointing out that two randomly generated Guid might still collide, suggesting that you still had to check for that case.

  3. Adam Rosenfield says:

    If they claim to support a resource with the GUID generated from the MAC address of a destroyed network card, you know they're lying.  Then you can ignore everything else about what they say (blogs.msdn.com/…/71307.aspx).

  4. Silly says:

    For four easy payments of $19.95 I will send him my Big Book o' GUIDs. Also GUID_NULL could probably be ignored as an optimisation.

  5. Ben says:

    That's the sort of question that gets on SO now. All the real questions seem to have been answered.

    That sort of thing, and "how to make teh join"

  6. John Bandela says:

    StackOverflow discussion of the same. stackoverflow.com/…/5466362

  7. danb says:

    Sometimes you wonder how some people managed to survive long enough to ask those kind of questions. Even with all the warning labels on.

  8. Karl Droog says:

    1. That code needs to be rewritten in ASM. After all, "everyone" knows that Assembly is Faster(tm).

    2. Those loops can be unrolled for greater efficiency. Testing may be required in order to find the optimum amount.

    3. Appeal to $DEITY for a temporary repeal of the laws of physics. Please consult with your nearest authorized religious support personnel and follow the guidelines and instructions in your reference documentation and help manuals.

  9. Brian says:

    It would just be faster to: get an inside job at the company you want to attack and/or find someway to steal the data onsite.

  10. Joshua Ganes says:

    Extremely large numbers are very surprising to the untrained. I can still remember a time before I performed some simple calculations when I thought that the number of legal chess games didn't seem too bad. There have been a few times where I started down the road to a brute force solution to various problems before performing some back of the envelope calculations showing that my solution would take years to complete.

    If this person is really determined to attack this service (this is not advice), then perhaps they could purchase a simple subscription to the service and try to analyze the data that is going back and forth. There may be patterns to exploit; at least to narrow the search space for their attack.

  11. Jason Warren says:

    My kingdom for an attacker who is willing to upgrade my infrastructure so they can attack me faster!

  12. Brian_EE says:

    A first optimization would be to restrict the algrotithm to the known space of valid MAC addresses.

    Kind of reminds me of when I bought my first 386 computer in 1991 and I wrote a QuickBasic program to try to brute force calculate prime numbers with nested for loops.

  13. Sockatume says:

    Maybe Raymond's correspondent isn't naive. Maybe he just operates on different scales that we do. Is it possible that The Inhibitors just really, really need to know what's in this online catalogue?

  14. Mrs. Whatsit says:

    Use emacs.

    C-m m-C (inverse-tesser (icbm-address-to-tesseract-address (47.640044 -122.125771 (+ 13 earth-sealevel)) gps-coordinates) .749)

  15. Brian says:

    The hubris is astounding.  Not only do they feel they have a divine right to scrape the site data, they contact Microsoft for help with it!

  16. Tito says:

    To be completely fair, while GUIDs are 16 bytes and randomly generated, their generation algorithms are designed to avoid _accidental_ collisions.  They are _not_ designed to be random enough for security purposes, i.e. they do not have 128 bits of entropy.  By taking known guids from the victim site, it should be possible to extrapolate clusters of guids and reduce the search space.  As Raymond said though, this is clearly an attack on the site.

    This is why web application platforms (ASP.NET, WebLogic, Rails, etc) don't use guids as session identifiers or CSRF tokens (any more).

  17. Medinoc says:

    I think the first thing to do would be to check WHICH type of GUID is used: If they are MAC address-based GUIDS, and there's only one server assigning them, the time could be divided by 2^48 and bounds could be found for the timestamp.

    If they're random, all that can be done is removing the version bits.

  18. Joshua says:

    @Medinoc: That also assumes they only use one GUID generator. I saw a version 6 GUID once.

  19. Mr Cranky says:

    On average, most numbers are very very large.

    "…it will still take 2¹²⁸ ÷ 10¹¹ = 3 × 10²⁷ seconds or 10¹⁹ years to complete. "

    "Right, that's why I was looking for help making the code run faster."

    "It would still take over 10¹³ centuries for them to try every GUID. "

    "Right, that's why I was looking for help making the code run faster. Sheesh, if you don't know how, please forward to someone who does!"

    I must admit, this is the funniest ONT article ever.

  20. mjb says:

    wouldn't it be faster to deposit $10 in a bank account and wait N years to be able to buy the company ;)

  21. mikeb says:

    For more "every possible GUID" fun, see: stackoverflow.com/…/how-to-generate-absolutely-unique-guids

    The line that made me laugh out loud:

    >> EDIT: Uh, its getting complicated. Thread safety kills the speed.

    At first I thought it was an April Fool's thread, but April 1 isn't in November.

  22. Nick V says:

    So, you think they would have more luck enumerating all the possible ways of asking the site operators for their database of GUIDs?

  23. Brad Westness says:

    I found the error: they're writing out to the console instead of the stream writer! :)

  24. Mott555 says:

    Great post Raymond. It got quite a chuckle out of me. Same with the comments and StackOverflow links.

  25. smf says:

    Just tell him that smashing network cards that have never been used will cut down on his search space. Point him at blogs.msdn.com/…/71307.aspx as "proof"

    Then sit back and see what he does….

  26. Mark Y says:

    I just have to ask: is this a true story?  I'm sure some minor details are altered/simplified/whatever, but is this in essence true?  I can't fathom such madness.

  27. asdbsd says:

    My guess: Raymond helps his kid niece/whoever.

    Also, @xor88: randomly generated GUIDs are only as good as your randomness is, and it never hurts to check. Just to get that off your mind.

  28. Evan says:

    Man this is the best TheDailyWTF I've seen in a long time! :-)

  29. ogo says:

    well I think the theoretical range of guids is too big.

    I would try it with a more practical relevant scope.

    Like bank account numbers. Here we haven't all possible ones in use, too.

    The question is how a guid will be created at runtime on a windows machine for example.

    I've heard that microsoft uses the mac, a timestamp, the id of the cpu and so on.

  30. Todd says:

    All you need is Google's admin password and a copy of hadoop.

  31. Don Reba says:

    > The hubris is astounding.  Not only do they feel they have a divine right to scrape the site data, they contact Microsoft for help with it!

    I bet he already contacted Google and Apple, but they were not interested in helping.

  32. j b says:

    As a parallell to this: A while ago, I came across a web page that revealed every secret PIN code (4 digit) in the world.

    Unfortunately I didn't make a note of the URL. Maybe I should have.

  33. Brad Westness says:

    @Nick

    >It might be something of a gray area, but I always figured that if someone makes something publicly available, there's nothing wrong with scraping it as long as you're considerate about it (and making one billion requests per second is not very considerate).

    If you're scraping public content by following visible links like a search engine, sure. If you're peppering the server with thousands/millions/billions of bogus requests on the off chance of stumbling on a good ID value, you're basically doing a denial of service attack.

  34. GrumpyYoungMan says:

    >10**19 years to complete

    Ah, but the questioner desires a result immediately. Clearly what he needs to do is to go back in time to the beginning of the universe (estimated to be 1.3×10**10 years ago) with a billion (10**9) computers, then he'd have an answer right about now.  

    The details are, of course, are left as an exercise for the person asking.

  35. Nick says:

    This guy clearly needs to optimize his attack.  If you try and store all the GUIDs first and then use them against the website you end up iterating over the list *twice*!  Much more efficient to issue a request for each GUID as you construct it as you only loop 2^128 times instead of 2^129. Much better.

    > Not only do they feel they have a divine right to scrape the site data

    It might be something of a gray area, but I always figured that if someone makes something publicly available, there's nothing wrong with scraping it as long as you're considerate about it (and making one billion requests per second is not very considerate).

  36. Doug says:

    After waiting 10^19 years, he may be a little disappointed when he finds that all_guids.txt is empty! That's what he gets for writing to the console rather than the file.

  37. Matt says:

    Lots of people seem to be commenting on the technical reasons why this is a bad idea, but few have commented on the LEGAL reason why it's a bad idea:

    > I've contacted the site owners, but they are not interested in helping me

    If the site owners are not interested in helping you, then brute forcing all of the GUIDs on the website to get at information you're not supposed to be able to see is against the law as per the Computer Fraud and Abuse Act (USA) or equivalent in whichever country the requester is in.

  38. Jerome says:

    It sound like this guy is guilty of not thinking outside the box…

    In this case, you need to break into their (physical) site, and steal their servers; then take your time to peruse their database at home.

  39. Rubble says:

    He's a couple I computed for you save some time :

    7911EE8D-2551-4755-9C18-83342C944DE1

    3C0B6782-BDC8-4D34-A5B0-6A10269021BF

  40. Ken says:

    Did the original email come from admin@prism.gov.us ?

  41. Adrian says:

    I'm sure I've seen this exact question before, answered this exact way. I thought it was on StackOverflow, but I can't find it now. Maybe I dreamt it.

    Or maybe I have some kind of split personality disorder, and that it's me (the other me) that asked this question… It's the kind of thing I could imagine an alternative, slightly evil (yet also slightly crazy) version of me might try and do :(

  42. Neil says:

    He needs to get in touch with the author of that Windows 95 video driver that you mentioned back on 11 Feb 2004.

  43. @Brad Westness

    > If you're scraping public content by following visible links like a search engine

    I would add "and you're following the site-owner-specified scraping rules laid out in /robots.txt".

  44. I have to admit, the thing that bothers me the most about this code is the variable names "l1, l2, l3, l4". There's a strong possibility that someone will visually confuse them with "11, 12, 13 14".

  45. Danny Moules says:

    "This reminds me of people sternly pointing out that two randomly generated Guid might still collide, suggesting that you still had to check for that case."

    Two, perhaps not. But if you keep generating a list and checking each against each other you risk running in to the Birthday Paradox. After about a trillion or so, things start to get mathematically significant and perhaps worth spending a second to protect against. It really then depends on whether you're talking about a NASA spacecraft or a personal blog at that point. Collisions can occur, it just depends on the context.

    Not to mention some hardware devices that don't properly initialise their RNG seed and can generate the same GUIDs by fault. You want to contract against that case regardless. You might trust the GUID specification, but that doesn't mean the implementation is sound.

    "To be completely fair, while GUIDs are 16 bytes and randomly generated, their generation algorithms are designed to avoid _accidental_ collisions.  They are _not_ designed to be random enough for security purposes, i.e. they do not have 128 bits of entropy."

    Depends on the nature of the GUID. Type 4 GUIDs with a solid cryptographic generator can, in practice, be used for security purposes and are a preferred default GUID type for that reason. i.e. They stop idiots who don't read the contracts of the types they use from introducing inadvertent security holes.

  46. DS says:

    @Matt

    Legally, the question is actually quite tricky (at least in the U.S.). Courts are divided on the issue of whether subjective intent of the site owner (ie., telling the user that 'we don't want you to do this') suffices to make your attempts to gain access against their will a criminal offense, or whether a technical barrier (like asking for password) is necessary (and actually, the latter interpretation of the law seems much more reasonable).

    Now, requiring the user to pass GUID of the resource could count as a technical barrier, but then you've got a slippery slope problem: suppose they required only 16-bit identifier, not a GUID, and that they have thousands of resources. If you have a substantial chance of stumbling upon a resource by entering a random URL, would it still count as circumventing a barrier to access? If not, where would you draw a line? At 1% chance? At 0.1% chance? How is the user supposed to know how many resources are there, if that's what potentially determines whether he is criminally liable? If yes, how about the case when a sequential resource id is used? Or a date? Should we assume that unless the site owner fails to provide a link, you can't access the page by guessing URLs? Do you really think it would be a good rule of criminal law (or even of tort law)?

    See http://www.volokh.com/…/aarons-law-drafting-the-best-limits-of-the-cfaa-and-a-reader-poll-on-a-few-examples-part-i for a more detailed discussion of the example I've been referring to in this post.

  47. voo says:

    Well to be fair, who would've expected an O(1) algorithm to take that much time?

  48. Random User 0582745 says:

    @Matt

    There is also the question of what, specifically, the exchange between the confused questioner and the site owners was. All we have is, "they are not interested in helping me." It is possible that while they are not interested in _helping_, the data is otherwise open and they don't (currently) care about _hindering_. Of course that may change part way through this brute-force/"accidental"-DoS.

  49. someone_random says:

    Because your GUID space is so pretty big it doesn't really makes sense listing all of them. You can easily pick a GUID at random, and try to fetch resources.

    Let's say:

    p(n) – pbb that we have a duplicate GUID

    p(n) = 0.0001

    m – size of GUID space (cardinality)

    m = 2^132

    then a chance, that you draw the same GUID with probability 0.01 (0.0001%) start when you test:

    n = sqrt(2^133*0.0001) ~= 10^17  (check Birthday Paradox problem)

    So you can easily pick a random GUID's for a quite a long time ;-)

  50. Rob L. says:

    What if I were to acquire Jupiter-sized planets and convert THEM to SSDs? Better, right?

  51. Matt says:

    @DS:

    Don't try and pretend that strawmen arguments about technical limitations lets you do what you want. Your argument degenerates to a "where do we draw the line on the brand of lock you must have on your front door before the thief isn't liable?". Theft is illegal even if you leave your wallet on the sidewalk, and computer hacking is illegal even if the site is peppered with vulnerabilities.

    The duty to protect information is not a defense against INTENTIONAL and MALICIOUS intrusion. It is a remedy against the site owners by the information owners.

    Unless you're a lawyer, don't try and pretend to yourself that what you're doing is legal if it's not. It'll only end in tears.

    As is always the case in these matters, your best bet is to assume that if what you are doing is clearly not the intent of the site owner, then refrain from doing it until you KNOW that it is legal. That way you won't get challenged and suffer the costs of getting legal advice or representation, never mind the costs of LOSING such lawsuits.

  52. 640k says:

    Making 2¹²⁸ typos in the browser's address field isn't hacking.

  53. Nick says:

    @Matt

    You're making an awful lot of assumptions here, primarily that this person is malicious and is "hacking" via some vulnerability in a site's security.  For all we know the site in question is a gallery of cats.  Each image is stored at a URL like http://www.catcats.com/?cat={GUID}.  This person emailed the site owners asking for a way to download all their cat pictures and the owners said "we don't have a way to do that, sorry".

    Would you really suggest that trying a bunch of URLs to get publicly available pictures of cats violates even the spirit of the law?

    Also, at some point site operators have to take some responsibility, especially when there's a lack of malicious intent.  DS and Random User made some great points.

    > Theft is illegal even if you leave your wallet on the sidewalk

    This isn't a valid comparison at all because the contents of a wallet are implicitly private, even if the wallet itself is in a public place (so this would be like a publicly accessible server with private content protected by some mechanism).  Remove the wallet and you have a better example, because there is no longer the implied privacy.  But even then, this is just a classic "copying == stealing" argument, which is specious at best.

  54. Gabe says:

    I remember when it was unreasonable to even think about pinging every IP address on the Internet. Nowadays with a 10Gbps link you could do so in a few minutes.

    It's even reasonable to try sending a packet to every MAC address (though it might take a few days)!

  55. Raphael says:

    “I remember when it was unreasonable to even think about pinging every IP address on the Internet.”

    These days, it's not so much pinging every IP address on the Internet, it's about finding out which have been assigned.

    What do you mean, there's at most about 4 billion? We have IPv6 now!

  56. voo says:

    > Theft is illegal even if you leave your wallet on the sidewalk

    Yes but looking (or taking a picture) of your wallet that you left in public space certainly is *not* illegal and that's the much better comparison here.

  57. dddd says:

    I'm reminded of the list of IPv4 addresses:

    http://imgur.com/btTm7vq

  58. smf says:

    >I remember when it was unreasonable to even think about pinging every IP address on the Internet.

    > Nowadays with a 10Gbps link you could do so in a few minutes.

    You might be able to send that many IDCMP packets to your ISP, but they'll get dropped pretty quickly as the rest of the Internet doesn't want to join in your game.

    > It's even reasonable to try sending a packet to every MAC address (though it might take a few days)!

    Only on the Lan you are connected to as MAC address isn't routed through gateways.

  59. rw says:

    Is this based on a real customer question or is this one of your weird dreams?

  60. Evan says:

    @640K: "Making 2¹²⁸ typos in the browser's address field isn't hacking."

    First, nice superscripts.

    Second, then what does count as hacking? Does making a typo in the username field — "oops, I sneezed and accidentally typed '; drop table USERS;" — count as hacking?

  61. smf says:

    > Yes but looking (or taking a picture) of your wallet that you left in public space certainly

    > is *not* illegal and that's the much better comparison here.

    That comparison doesn't fit either. With a wallet the value is in having procession of the wallet. While with data the value is having access to it. If the data can only be retrieved by giving the correct Guid then it's pretty easy to argue that you haven't been given access to it and are trying to get around the access controls. In the same way that trying every combination of door access code/alarm code/atm pin.

  62. Mark - The other Mark says:

    Making typo's in a browsers field isn't hacking. Sending a request for each and every GUID in existence as fast as possible definitely is, if only from a DOS point of view.  

    Generally speaking, if you're doing this to get data that the owner doesn't want you to have, it's probably hacking. If the owner of the data is ok with you having the data, and you're doing this in a manner which doesn't involve a Denial of Service, you're probably not.

    Protecting your data by hiding it behind a GUID is a silly way to secure it. Getting that data by asking for all possible GUIDs is even sillier.

  63. ender says:

    > Making typo's in a browsers field isn't hacking.

    British courts disagree (there was a case a few years ago when somebody removed the last path component, and was then convicted of hacking – even though some browsers have built-in functionality for this).

  64. DBuches says:

    Obligatory: http://wasteaguid.info/

    Click it a few times to help this guy out.

Comments are closed.

Skip to main content