Why does RegOpenKey sometimes (but not always) fail if I use two backslashes instead of one?


A customer reported that on Windows XP, they observed that their program would very rarely get the error ERROR_INVALID_ARGUMENT when they passed two backslashes instead of one to the Reg­Open­Key­Ex function:

RegOpenKeyEx(hk, L"Blah\\\\Oops", ...);

After removing C++ escapes, the resulting string passed to Reg­Open­Key­Ex is

Blah\\Oops

The failure was very sporadic and not reproducible under controlled conditions.

Well, first of all, doubled backslashes are not legal in registry key paths in the first place, so the first recommendation is stop doubling the backslashes. Once you fix that, the problem will go away.

But the next question is why the error was detected sometimes but not always.

When an application tries to open a registry key, the registry code first consults a cache of recently-opened keys, since registry accesses exhibit very high locality of reference. If a match is found in the cache, then the cached result is used. Otherwise, it's a cache miss, and the registry tree is searched in the old-fashioned way. The registry tree search rejects the double-backslash since it interprets the path Blah\\Oops as "Look for a subkey called "Blah", then a subkey called "", then a subkey called "Oops"." The "subkey called """ step fails because key cannot have an empty string as their name.

On the other hand, the code that checks the cache has a different search algorithm which happens to have the effect of collapsing consecutive backslashes, so the path Blah\\Oops is interpreted as "Look for a subkey called "Blah", then a subkey called "Oops"." (Note: "has the effect of". There is no explicit "collapse backslashes" step; it just turns out that the way the path is parsed, consecutive backslashes end up being treated as if they were single backslashes.)

In the customer's case, therefore, the key in question is in the cache most of the time, which is why the doubled backslash is silently corrected to a single backslash. But every so often, the key is not in the cache, and the old-fashioned search is performed. And the old-fashioned search rejects the double-backslash as an invalid path.

The discrepancy in the two parsing algorithms was resolved in Windows Vista, so you'll see this issue only on Windows XP and earlier.

But this historical tidbit does highlight one of the hidden gotchas of optimization: If your optimized version differs from the unoptimized version in cases that are theoretically anyway illegal, you may find yourself chasing elusive bugs when somebody accidentally stumbles into those cases and managed to get away with it... until now.

Comments (11)
  1. To be fair, the MSDN documentation for RegOpenKeyEx doesn't specify the naming rules for registry key paths. Not even in the links in the section describing the lpSubKey parameter. msdn.microsoft.com/…/ms724897(v=vs.85).aspx

    Even the MSDN doc describing the structure of the registry is not clear (it only says the key name cannot have in it, nothing about path delimiting): msdn.microsoft.com/…/ms724946%28VS.85%29.aspx

  2. kog999 says:

    I really thought you were going to write "doubled backslashes are not legal in registry key paths so the behavior is undefined and anything could happen" and that would be the end of the blog entry.

  3. MItaly says:

    On the other hand, AFAIK doubling the path separator in paths seems to have no effect (both on Windows and Linux); is this even guaranteed? On the usual MSDN page (msdn.microsoft.com/…/aa365247%28v=vs.85%29.aspx) I found nothing about this; is this behavior guaranteed in some way or is just an implementation-specific detail (that will never go away since almost everybody relies on it for poor man's path concatenation)?

  4. David Walker says:

    That's really interesting!  It also points out that when you are writing code, another benefit of reusing the same code block (made into a function call, etc.) is that the *exact* same algorithm is used in all of the places that are trying to do the same thing.  I have made this subtle mistake myself, where one piece of code tries to do the same thing as another piece of code, and the first set of code almost matches the second set in functionality…

  5. A. Skrobov says:

    I can't help but wonder why at all the cache should parse the key name, rather than treat it as a single unit. A cache must be fast and simple, after all.

  6. Ry Jones says:

    There was a developer for distributed.net that found certain iterations of the algorithm for GPUs were much faster than others. He optimized dnet to get the client to use this fast setup as much as possible. Years later, it was found this fast mode was in fact failing silently.

    Much hair was pulled in frustration when the bug was fixed, causing key rates to fall greatly.

  7. GregM says:

    "I can't help but wonder why at all the cache should parse the key name, rather than treat it as a single unit. A cache must be fast and simple, after all."

    Because the currently open key may not be a root key, so the currently open key would need to be combined with the key being opened.  It's likely also much more efficient to store the cache in tree form rather than flat.

  8. A. Skrobov says:

    so the currently open key would need to be combined with the key being opened

    Combined? OK. Why do you need to parse the key name for that?

    It's likely also much more efficient to store the cache in tree form rather than flat.

    How can it be more efficient? RegOpenKeyEx receives strings anyway, so you can't avoid doing string comparison. But why do anything besides it?

  9. ASkrobov: Security? If a particular key has an ACL which denies you access, having the cache give you access to some of the contents (or even just disclose information about the presence or absence of subkeys) would be a hole: 'bypass traverse checking' is granted by default, but that's not enough to let you hard-wire it into the caching algorithm for all time! So, if someone accesses 'SoftwareExampleBlah', you still potentially need to check the ACL on 'Software' and 'SoftwareExample' first.

    The other issue of course is that you could open a key to 'SoftwareExample', then ask for 'Blah' within that key-handle – so a simpler cache which just stored 'SoftwareExampleBlah' would then need at least some parsing or similar logic anyway. If your cache is tree-based, though, the tree-walk could BE your string comparison step as well, keeping things neat and efficient. (Or just memory-map the hive, walk the on-disk format in every case and let the virtual memory subsystem deal with caching data for you.)

  10. Ltw says:

    I know how to fix the bug! Just add a timer that checks the value every second so it's always in the cache! Problem solved :P

  11. Stefan Kanthak says:

    What happened to the 20+ posts that happened to be here before yesterday?

    [When a thread turns to insults, I delete it. -Raymond]

Comments are closed.