What is the format of a double-null-terminated string with no strings?


One of the data formats peculiar to Windows is the double-null-terminated string. If you have a bunch of strings and you want to build one of these elusive double-null-terminated strings out of it, it’s no big deal.

H e l l o \0 w o r l d \0 \0

But what about the edge cases? What if you want to build a double-null-terminated string with no strings?

Let’s step back and look at the double-null-terminated string with two strings in it. But I’m going to insert line breaks to highlight the structure.

H e l l o \0
w o r l d \0
\0

Now I’m going to move the lines around.

Hello\0
world\0
\0

This alternate way of writing the double-null-terminated string is the secret. Instead of viewing the string as something terminated by two consecutive null terminators, let’s view it as a list of null-terminated strings, with a zero-length string at the end. Alternatively, think of it as a packed array of null-terminated strings, with a zero-length string as the terminator.

This type of reinterpretation happens a lot in advanced mathematics. You have some classical definition of an object, and then you invent a new interpretation which agrees with the classical definition, but which gives you a different perspective on the system and even generalizes to cases the classical definition didn’t handle.

For example, this “modern reinterpretation” of double-null-terminated strings provides another answer to a standard question:

How do I build a double-null-terminated string with an empty string as one of the strings in the list?

You can’t, because the empty string is treated as the end of the list. It’s the same reason why you can’t put a null character inside a null-terminated string: The null character is treated as the terminator. And in a double-null-terminated string, an empty string is treated as the terminator.

One\0
\0
Three\0
\0

If you try to put a zero-length string in your list, you end up accidentally terminating it prematurely. Under the classical view, you can see the two consecutive null terminators: They come immediately after the "One". Under the reinterpretation I propose, it’s more obvious, because the zero-length string is itself the terminator.

If you’re writing a helper class to manage double-null-terminated strings, make sure you watch out for these empty strings.

This reinterpretation of a double-null-terminated string as really a list of strings with an empty string as the terminator makes writing code to walk through a double-null-terminated string quite straightforward:

for (LPTSTR pszz = pszzStart; *pszz; pszz += lstrlen(pszz) + 1) {
... do something with pszz ...
}

Don’t think about looking for the double null terminator. Instead, just view it as a list of strings, and you stop when you find a string of length zero.

This reinterpretation also makes it clear how you express a list with no strings in it at all: All you have is the zero-length string terminator.

\0

Why do we even have double-null-terminated strings at all? Why not just pass an array of pointers to strings?

That would have worked, too, but it makes allocating and freeing the array more complicated, because the memory for the array and the component strings are now scattered about. (Compare absolute and self-relative security descriptors.) A double-null-terminated string occupies a single block of memory which can be allocated and freed at one time, which is very convenient when you have to serialize and deserialize. It also avoids questions like “Is it legal for two entries in the array to point to the same string?”

Keeping it in a single block of memory reduces the number of selectors necessary to represent the data in 16-bit Windows. (And this data representation was developed long before the 80386 processor even existed.) An array of pointers to 16 strings would require 17 selectors, if you used GlobalAlloc to allocate the memory: one for the array itself, and one for each string. Selectors were a scarce resource in 16-bit Windows; there were only 8192 of them available in the entire system. You don’t want to use 1% of your system’s entire allocation just to represent an array of 100 strings.

One convenience of double-null-terminated strings is that you can load one directly out of your resources with a single call to LoadString:

STRINGTABLE
BEGIN
IDS_FILE_FILTER "Text files\0*.txt\0All files\0*.*\0"
END

TCHAR szFilter[80];
LoadString(hinst, IDS_FILE_FILTER, szFilter, 80);

This is very handy because it allows new filters to be added by simply changing a resource. If the filter were passed as an array of pointers to strings, you would probably put each string in a separate resource, and then the number of strings becomes more difficult to update.

But there is a gotcha in the above code, which we will look at next time.

Bonus Gotcha: Even though you may know how double-null terminated strings work, this doesn’t guarantee that the code you’re interfacing with understands it as well as you do. Consequently, you’d be best off putting the extra null terminator at the end if you are generating a double-null-terminated string, just in case the code you are calling expects the extra null terminator (even though it technically isn’t necessary). Example: The ANSI version of CreateProcess locates the end of the environment block by looking for two consecutive NULL bytes instead of looking for the empty string terminator.

Comments (31)
  1. someone else says:

    C strings are an ugly hack. First thing I do after I invent the time machine, I’m gonna whack whoever invented them. With a nailed club.

  2. nathan_works says:

    And how many crashes, leaks, overflows, et al could be tracked to this "nifty" trick of cramming nul terminated multiple strings and parsing them out ? (I’ve heard it, dealt with in to get registry values, but still look at the code and shake my head..)

  3. Mark says:

    The Bonus Gotcha is actually vitally important: apart from CreateProcess, I’ve seen various programs that search for the "double null".  They usually crash when you pass "something". It’s called "double-null terminated", so sadly that’s what it must be.

  4. Maurits says:

    The ANSI version of CreateProcess locates the end of the environment block by looking for two consecutive NULL bytes

    Sounds like a bug.  It’s perfectly legal for an environment not to have any variables set, no?

  5. porter says:

    MSVC 4.0 says that for CreateProcess…

    Note that an ANSI environment block is terminated by two zero bytes: one for the last string, one more to terminate the block. A Unicode environment block is terminated by four zero bytes: two for the last string, two more to terminate the block.

    So it’s not a bug, it’s part of the specification of CreateProcess().

  6. John says:

    Not to nitpick, but you can generate an array of pointers to strings using one allocation.  I can’t remember off the top of my head, but I’m pretty sure there are a few places in the Windows API that do this.  Of course, this is much easier to get wrong than a double-null-terminated string.

  7. Anonymous says:

    This stuff is pretty darned easy to parse, and consistent with the concept of C strings.  I actually built lists of strings this way long before I had any Windows experience.  Is it really the fault of the data representation if somebody couldn’t be bothered to figure this out?  It’s not so much a string list terminated by double-nul, but a list of C strings that ends when you see an empty string.

    But, I guess some people can’t be bothered with knowing how to count their buffers, as evidenced by some of the comments here.

  8. 223 says:

    To push the reinterpretation one step further: C automatically automatically adds the null terminator to string literals "xxxxxx". But it also automatically adds the empty string terminator to a string list.

    Thanks, Mark, for your comment, I hadn’t understood the gotcha.

  9. avek says:

    Null-terminating strings, a mess. Double null-terminated strings, double mess. No surprise.

    What’s interesting, RegQueryValueExW returns single L’’ when registry value is REG_MULTI_SZ with no string. But RegQueryValueExA returns two ‘’s. The non-Unicode version says there’s only 1 byte returned but actually stores 2 bytes in the output buffer.

    So everything out there has actually its own idea of how to represent an empty DNT multistring. The gotcha is not only in consuming APIs like CreateProcess, it’s in producing APIs too. And we should write code as it’s always done with ambiguous formats: on output, produce the most safe variant (double null), on input, expect the least possible (single null).

    "One convenience of double-null-terminated strings is that you can load one directly out of your resources with a single call to LoadString"

    Great. So every c00LhAcKeR child out there can tweak the program, "allowing" to open unsupported file formats easily. Doesn’t look very good to me.

    As result of all the gotchas and side effects of this format, some programs out there don’t use REG_MULTI_SZ in the registry at all. They use REG_SZ with conventional line separators instead, rn or like. Which makes their registry entries uneditable in Regedit (single-line edit control there eats rn while editing and doesn’t allow to enter them).

  10. nathan_works says:

    Well, anonymous, think of all the brilliant programmers at MSFT (and this is not said sarcastically) who still get buffers wrong and don’t parse what you seem to think of as trivial input..

  11. ace says:

    To avek, re "So every * out there can tweak the program": you’re wrong. By having your program easier to edit later, just like Raymond presented, you win, except if you’re only writing "write once then throw away" programs.

    I like MULTI_SZ’s. Efficient and the absolute minimum needed to have an array of sz strings. They are also a nice example where not every string is aligned to 4 or even 16 bytes. Young weaklings got spoiled enough expecting to do a "new" (really malloc) for every damn thing.

    Who needs pointers (a lot of random accesses) can initialize them in the separate array of pointers himself. Who doesn’t has the minimal representation. A good thing.

  12. Grant Husbands says:

    It seems odd to create an alternative formalism for double-null-termination that disagrees with the name and typical description for such, to point out a case where that formalism doesn’t have double-null as termination, and then imply that code that depends on there actually being double-null termination is somehow wrong.

    Maybe I’m just a nitpicker.

  13. Leo Davidson says:

    Grant: Having "double-null" in the name is unfortunate as there isn’t a double-null in the empty case (but then there aren’t any strings, either). The way Raymond describes it seems completely valid to me, though.

    i.e. Raymond’s description is valid and correct. The name of the datatype is what’s wrong (or at least poorly chosen).

    The important thing to note is that you can’t have an empty string in a multi-sz list. Once you realise that you realise that is an empty list.

    Raymond’s alternative way of looking at the data doesn’t actually change what the data is. However you look at it, it’s a bunch of null-terminated strings followed by one extra null. It should be obvious that if you take away all of the strings you’re only left with the null.

  14. 640k says:

    Note that an ANSI environment block is terminated by two zero bytes: one for the last string, one more to terminate the block.

    So it’s not a bug, it’s part of the specification of CreateProcess().

    And when the "last string" is non-existing, there would only be 1 null according to this written specification.

  15. Leo Davidson says:

    avek: If you want to stop people messing with your string resources then use code-signing or something similar.

    Storing resources in an obscure, hard-to-maintain format doesn’t seem the answer (unless you’re actually trying to obfuscate them as well as stop people editing them), and using code-signign (or similar) protects a lot more than just your string resources. (It’ll protect the code itself, plus string literals, etc.)

  16. 640k says:

    Not to nitpick, but you can generate an array of pointers to strings using one allocation.  I can’t remember off the top of my head, but I’m pretty sure there are a few places in the Windows API that do this.  Of course, this is much easier to get wrong than a double-null-terminated string.

    EnumServicesStatus and EnumDependentServices does this. One parameter to these functions are an array of structs including pointers to strings. The array of structs is used to store strings on output. The strings is allocated from the end of the last struct in the array.

  17. 640k says:

    16-bit windows on 8086 didn’t have selectors, that can NOT be the reason for not allocating individual strings.

    16-bit windows and w9x doesn’t have any apis for updating resources at all. Visual Studio cannot edit resources on those OSes, usually the program is recompiled anyway.

    [Fine. For “selector,” read “global handle.” Since real-mode Windows locked memory for the shortest time possible, you really would have had to pass an array of global handles, which is even more cumbersome. -Raymond]
  18. Medinoc says:

    I have a memory of some functions that needed a TRIPLE-null terminated string, like "keyvalue"(+automatic).

    These functions expected couples of strings, and the lists were only terminated by couples of empty strings.

    Didn’t GetOpenFileName() work like this for its filters?

  19. Anonymous Coward says:

    According to the SDK, ‘Each process has an environment block associated with it. The environment block consists of a null-terminated block of null-terminated strings (meaning there are two null bytes at the end of the block)’ (and essentially the same: ‘Note that an ANSI environment block is terminated by two zero bytes: one for the last string, one more to terminate the block.’), a non-sequitur that adequately documents the CreateProcess implementer’s lack of understanding.

    Considering all the C-string related bugs I’ve seen, and the incredibly inconsistent behaviour of the Windows API regarding double-null-terminated string lists, I have to fully agree with the time machine comment.

    @Global handle: thanks, now I’m going to have nightmares again. Still, I’ve used string lists in DOS long ago, and I just used: count, length, chardata, length, chardata… It didn’t even occur to me to use null terminators. Of course these had their own problems, like no Unicode because that didn’t exist yet, maximum length 64 kB. Those were different days.

  20. pat says:

    Putting file type filters into the string table resource is extremely useful to support internationalization of your application.

    The ‘gotcha’ is that the string table doesn’t support embedded nul characters so you need to use some other separator instead (like a pipe character) and do some substitution before feeding the string to the file selection dialog.

    [The string table does support embedded null characters. Perhaps your problem lies elsewhere. -Raymond]
  21. Wizou says:

    [The string table does support embedded null characters. Perhaps your problem lies elsewhere. -Raymond]

    Wow… an HTML link into the future ^^

  22. Random832 says:

    You could still use a single global handle and use an array of ints where the int is an offset relative to the base of the allocated memory.

    But like you were saying, that’s even more cumbersome.

  23. Markus says:

    C strings might have been an ugly hack, but at the time, the only well-known alternative were Pascal strings, which were limited to 255 bytes.

  24. Mang says:

    I can understand describing double-null-terminated strings by saying that you end the block with an empty string.  This type of logic makes sense if you’re parsing the data in a loop.

    Example: read until the next null.  If length > 0, then it’s a string.  If length == 0, then it’s the end of the block.

    I still think it’s better to say that it’s a null-terminated group of null-terminated strings.  We already know that we use a null to terminate strings.  The point that I like to stress is that the last null is not an empty string, but a null signifying the end of the block.

    It’s an issue of type (null vs. empty string), rather than data, since they’re effectively the same.

    Thanks for writing about it.  This sort of stuff should be discussed more often.

  25. Ben Voigt [C++ MVP] says:

    "no strings" is the corner-case which shows the error (if the official description is double-NUL terminated) of the "list which is terminated by an empty string" treatment.

  26. I would change the naming convention in the for loop:

    for (LPTSTR pszz = pszzStart; *pszz; pszz += lstrlen(pszz) + 1) {

    … do something with pszz …

    }

    pszz should be psz:

    for (LPTSTR psz = pszzStart; *psz; psz += lstrlen(psz) + 1) {

    … do something with psz …

    }

    Why? The code inside the loop body deals with a single zero-terminated string. That part of the code shouldn’t care that this string happens to be a piece of a pszz-formatted string.

    psz is also a better match for the recommended way of thinking about the code:

    “Don’t think about looking for the double null terminator. Instead, just view it as a list of strings, and you stop when you find a string of length zero.”

  27. Earl says:

    More correctly, we should speak of the "Double-Nul Terminated String":

    http://en.wikipedia.org/wiki/File:ASCII_Code_Chart-Quick_ref_card.jpg

  28. Maurits says:

    Or better yet, double-oh-terminated-string.

    Captcha: 007 (if only…)

  29. Neil says:

    REG_MULTI_SZ has a length, so it can of course contain empty embedded strings. This makes editing PendingFileRenameOperations trickier!

    P.S. This is the first captcha for which my browser has tried to autocomplete a previous value, but then maybe I haven’t filled in enough captchas to notice the problem on other sites.

  30. Miral says:

    Possibly just ninja’d by Neil, but:

    For bonus points, see HKEY_LOCAL_MACHINESYSTEMCurrentControlSetControlSession ManagerPendingFileRenameOperations, which is a REG_MULTI_SZ that almost always does contain empty strings.

  31. chrdavis says:

    Paths that are not double-null terminated are the most common issue when dealing with SHFileOperation.

    http://msdn.microsoft.com/en-us/library/bb762164(VS.85).aspx

    I even put it at the top of the list of the top coding mistakes when using this API.

    http://web.archive.org/web/20071015210149/shellrevealed.com/blogs/shellblog/archive/2006/09/28/Common-Questions-Concerning-the-SHFileOperation-API_3A00_-Part-2.aspx

Comments are closed.