Why double-null-terminated strings instead of an array of pointers to strings?


I mentioned this in passing in my description of the format of double-null-terminated strings, but I think it deserves calling out.

Double-null-terminated strings may be difficult to create and modify, but they are very easy to serialize: You just write out the bytes as a blob. This property is very convenient when you have to copy around the list of strings: Transferring the strings is a simple matter of transferring the memory block as-is. No conversion is necessary. This makes it easy to do things like wrap the memory inside another container that supports only flat blobs of memory.

As it turns out, a flat blob of memory is convenient in many ways. You can copy it around with memcpy. (This is important when capturing values across security boundaries.) You can save it to a file or into the registry as-is. It marshals very easily. It becomes possible to store it in an IData­Object. It can be freed with a single call. And in the cases where you can't allocate any memory at all (e.g., you're filling a buffer provided by the caller), it's one of the few options available. This is also why self-relative security descriptors are so popular in Windows: Unlike absolute security descriptors, self-relative security descriptors can be passed around as binary blobs, which makes them easy to marshal, especially if you need to pass one from kernel mode to user mode.

A single memory block with an array of integers containing offsets would also work, but as the commenter noted, it's even more cumbersome than double-null-terminated strings.

Mind you, if you don't need to marshal the list of strings (because it never crosses a security boundary and never needs to be serialized), then an array of string pointers works just fine. If you look around Win32, you'll find that most cases where double-null terminated strings exist are for the most part either inherited from 16-bit Windows or are one of the cases where marshalling is necessary.

Comments (16)
  1. CarlD says:

    Gotta love those double-null terminated strings!  Personally, they never caused me any trouble and always made perfect sense.

  2. Adam Rosenfield says:

    This falls under the category of marshaling, but I think it's worth calling out that they're also useful if you have the data in shared memory.  You can just share the double-null-terminated strings straight up between processes with no gotchas.  If you used an array of pointers instead, that wouldn't work because the processes have different address spaces.  And like you said, you could use an array of integer offsets instead of pointers, but that's even more cumbersome.

  3. dave says:

    As mentioned in the other discussion, it's a perfectly simple paradigm as long as you approach it as "list of strings terminated by an empty string" and don't subscribe to the double-null voodoo misdescription.

  4. John says:

    Sometimes I triple null terminate my strings just to be safe.

  5. Mason Wheeler says:

    Wow. Just when I thought C strings couldn't get any worse, I run across this article, and the earlier one behind the first link!

    How many of the problems with double-null-terminated strings (including the problem of including a blank string in the middle of the array) would vanish instantly if the string-array type was based on Pascal strings instead? (Or some variant on the basic concept that can handle strings longer than 255 bytes?)

    [Great, change it to "array of counted strings, terminated by a zero-length string." You still have the problem of including a blank string in the middle of the array. -Raymond]
  6. Cesar says:

    > [Great, change it to "array of counted strings, terminated by a zero-length string." You still have the problem of including a blank string in the middle of the array. -Raymond]

    He did not go far enough with his change of the concept. How about a counted array of counted strings? (That is, the array begins with a size_t giving the number of strings, and each string is prefixed with a size_t giving the string's size.)

    [Sure, you could do that too. But a counted array of null-terminated strings works just as well; the pascal-ness was a red herring. -Raymond]
  7. nil says:

    > [Sure, you could do that too. But a counted array of null-terminated strings works just as well; the pascal-ness was a red herring. -Raymond]

    No, c strings doesn't works just as well, pascal strings can contain null chars.

    [But that has nothing to do with the representation of a collection of strings. (I'm not arguing for/against C-style vs. pascal-style strings. I'm just pointing out that "switch to pascal style" doesn't address the original problem.) -Raymond]
  8. EvanED says:

    nil: That's still a red herring in the sense that moving from a single string to a list of strings (be it counted or terminated by an empty string) doesn't introduce any issues that weren't already around if you have just a single null-terminated string.

  9. Alex Cohn says:

    It would've been much healthier if the total size were stored in the begginning of this blob, like ole strings or pascal strings. Nowadays, there is no simple (i.e. without a loop) way to find that third parameter for memcpy. No analog of strlen(). Maybe this is whyFor this very reason %PATH% uses ; and $path uses :

  10. davep says:

    Mason Wheeler: "Wow. Just when I thought C strings couldn't get any worse, I run across this article, and the earlier one behind the first link."

    The double-null-terminated strings are not very common. Don't panic. It's unlikely that they will cause your children to drop out of school and become hippies.

  11. Joshua says:

    I actually *like* the double null terminated strings.

  12. Cheong says:

    Just want to ask: Since there are fixed char size Unicode strings (such as UCS-2/UTF-32) that can contain null bytes, is there any established method for handling nulls for them in null terminated strings (or double null terminated ones)? Or should we just seek other possible methods like passing offsets when we need to pass the strings through the boundaries?

  13. Medinoc says:

    I hardly had any problem with double-null terminated strings.

    It gets tricky when you start having TRIPLE-null terminated strings, such as OPENFILENAME's filter list (actually a list of pairs of strings, terminated by a pair of empty strings).

  14. LR says:

    @Cheong:

    This is not a problem because UCS-2 "double-null-terminated strings" are terminated by two null UCS-2 *characters* (not bytes).

    Like Raymond said, the easier and more consistent way of describing this concept is: The list of null-terminated strings is terminated by the first empty string. (Therefore, in case of UCS-2, you have four null bytes at the end. But if the entires list is empty, only two null bytes are there.)

  15. Mason Wheeler says:

    Yes, I meant a counted array of counted strings, like Cesar said.  I suppose I didn't explain it well enough.  That would allow for empty strings in the middle of the list.  And making the individual strings Pascal (counted) strings instead of C strings has a definite advantage if this is used for marshalling: if you know the length up front, you can copy the bytes around much faster because you don't have to do it one byte at a time testing each one for null.

  16. @Medinoc: "It gets tricky when you start having TRIPLE-null terminated strings, such as OPENFILENAME's filter list"

    I don't think that requires a triple-null terminated string.  The docs only say 'The last string in the buffer must be terminated by two NULL characters'.

Comments are closed.

Skip to main content