If I'm going to store a SID in a file, should I store the string form or the binary form?


A customer needed to save some user SIDs into a file and wanted guidance on whether to use string format or binary format.

Here are some pros and cons.

String format: To save the SID, use the Convert­Sid­To­String­Sid function, then save the string to the file. To load the SID, read the string from the file, then call Convert­String­Sid­To­Sid. The conversion may fail if the string is corrupted (either accidentally or maliciously).

  • Pro: Hard to get wrong. The heavy lifting is done by the two helper functions. In particular a corrupted string SID will be detected by Convert­String­Sid­To­Sid.

  • Pro: Strings are well-known quantities. You probably already have code to load and save strings. They are also easy to see in memory dumps or in file viewers.

  • Con: Takes up more space. The string format is larger than the binary format.

  • Con: The conversion cost can add up if you have to do it a lot.
  • Worst-case size is large: S-255-281474976710655-4294967295-4294967295-4294967295-4294967295-4294967295-4294967295-4294967295-4294967295-4294967295-4294967295-4294967295-4294967295-4294967295-4294967295-4294967295 = 188 characters, if you also include the null terminator.

Binary format: To save the SID, use the Get­Length­Sid function, then save the raw bytes to the file. To load the SID, read the bytes from the file, then validate the SID to ensure it was not corrupted (either accidentally or maliciously).

  • Cons: Validating a SID is tricky. (Details below.) If you mess up, you may have a security vulnerability.

  • Cons: Binary format is harder to spot in a memory dump or in a file viewer.

  • Pro: Takes up less space.
  • Worst-case size is SECURITY_MAX_SID_SIZE = 68 bytes, so you might even just allocate a fixed buffer for the SID and avoid the variable-length problem.

The tricky part is validating that a chunk of memory is a valid SID.

You might think that the Is­Valid­Sid function would do that for you, but it can't because the function doesn't have a cbSize parameter, so it cannot validate that the purported SID fits inside the buffer. The Is­Valid­Sid function does logical validation, not physical validation. (It assumes that the memory is formatted like a SID, and it's checking whether the formatting is legal.)

Therefore, you have to do the length validation yourself, and then let Is­Valid­Sid do the semantic validation only after you have verified that the length is correct.

BOOL IsValidUntrustedSid(PSID psid, size_t cbSize)
{
    // First make sure the SID is at least the minimum size.
    // This ensures that we can read the revision and subauthority
    // count.
    if (cbSize < SECURITY_SID_SIZE(0)) return FALSE;

    // Now that we know the header is readable, we can calculate
    // the length the SID claims to be and make sure it is actually
    // that length.
    if (cbSize != GetLengthSid(psid)) return FALSE;

    // Now that we know the entire memory block is the right size,
    // we can use IsValidSid.
    return IsValidSid(psid);
}

Using strings is more convenient, and as long as the conversion isn't a bottleneck, and the disk space is not an issue, it would probably be a more convenient choice for a persistence format.

Note that the Convert­String­Sid­To­Sid function parses abbreviations for well-known SIDs. For example, you can pass BA and out will come the Builtin Administrators group. If you want to block that, you can first check that the string being converted begins with S-.

On the other hand, the security people tell me that defending against shorthand SIDs like BA isn't all that interesting. Since the attacker controls the string, they could just use the raw format S-1-5-32-544 instead. Some shorthand SIDs expand to include the domain SID. For example EA expands to S-1-5-21-X-519, where X is the domain RID. Even if you blocked the shorthand SID, the attacker could still pass the full string S-1-5-21-X-519. (From a security-theoretical point of view, the SID for the domain is not considered sensitive data. You should assume that attackers already know your domain SID.)

But wait, we got all distracted with answering the question and forgot to solve the problem.

In general, it is rare to save just the SID all by itself. Usually a SID is part of a security descriptor, so you should be saving the entire security descriptor. (We saw this some time ago when we discussed how the SID history is used when a user's SID changes.)

Comments (11)
  1. skSdnW says:

    Probably not that relevant anymore but Convert­Sid­To­String­Sid does not exist on NT4 and IIRC ConvertStringSidToSid does not support abbreviations on 2000.

  2. Joshua says:

    I don't think this code works. Consider what happens if it's called with S-1-5-? where ? is the start of an unallocated memory block. That is, we arrange so that the SID is truncated, and the truncation is at the end of the file, and aligned to a 4k boundary.

    1. In that case, cbSize would be less than GetLengthSid(), so we return FALSE at the second test.

      1. McBucket says:

        A bit of a Catch-22 here, isn't there? From the GetLengthSid function docs:

        Return value

        If the SID structure is valid, the return value is the length, in bytes, of the SID structure.
        If the SID structure is not valid, the return value is undefined. Before calling GetLengthSid, pass the SID to the IsValidSid function to verify that the SID is valid.

        OK, so it's unlikely that an undefined value will match cbLen, but still.

        Frankly, though -- if the SID crosses an allocation boundary, the problem is outside of the scope of IsValidUntrustedSid() function anyhow, is my feeling.

        1. I would hazard a guess that in practice GetLengthSid() returns the size the SID would be if the subauthority count was correct, which should also be the maximum number of bytes that IsValidSid() might read. If so, the code is OK - in practice.

  3. Seems to me that the pros of the string type and the cons of the binary type are really negligible.
    1. "Strings are well-known quantities" but the binary form is dominant data I read and write. (Well, not myself; my apps.) Since I am very Unicode-conscious, I often read the strings as raw binary before calling the proper conversion routines.
    2. One has to deal with detecting binary SIDs anyway, so long as they are not consumed in the string form. Apps tend to crash after the conversion, when they are doing something meaningful.
    3. What you call tricky is less than the average amount of day-to-day trickiness. I often add a checksum to files that have such important pieces of information.

  4. Roman says:

    The documentation for GetLengthSid says "Before calling GetLengthSid, pass the SID to the IsValidSid function to verify that the SID is valid.". So doesn't IsValidUntrustedSid technically invoke undefined behavior?

    1. It's a catch-22. IsValidSid assumes that the memory is valid, but you don't know how much memory needs to be valid until you call GetLengthSid. Turns out that GetLengthSid requires only that the header be present. I'll see if I can patch up the documentation.

      1. Joshua says:

        Ah there we go.

        On a related note, this is a good example of how easy it is to end up relying on undocumented things.

  5. cheong00 says:

    I've seen people directly validate what would be the cbSize here against "sizeof(SID)", that reading the macro defination of SECURITY_SID_SIZE(), would be equivalent of assuming there will be at least 1 sub authority specified there.

  6. Killer{R} says:

    IMHO if you seriously worried about possibility of SID corruption then using stringsid conversion to guard against that is not best idea. Likely its a worse idea. Hashing should be used against such issues. And hashing able (and should) protect not just single entity (SID) but whole storage where it saved probably with some other important information.

Comments are closed.

Skip to main content