What happens if you pass a source length greater than the actual string length?


Many functions accept a source string that consists of both a pointer and a length. And if you pass a length that is greater than the length of the string, the result depends on the function itself.

Some of those functions, when given a string and a length, will stop either when the length is exhausted or a null terminator is reached whichever comes first. For example, if you pass a cchSrc greater than the length of the string to the StringCchCopyN function, it will stop at the null terminator.

On the other hand, many other functions (particularly those in the NLS family) will cheerfully operate past a null character if you ask them to. The idea here is that since you passed an explicit size, you're consciously operating on a buffer which might contain embedded null characters. Because, after all, if you passed an explicit source size, you really meant it, right? (Maybe you're operating on a BSTR, which supports embedded nulls; to get the size of a BSTR you must use a function like SysStringLen.) For example, if you call CharUpperBuff(psz, 20), then the function really will convert to uppercase 20 TCHARs starting at psz. If there happens to be a null character at psz[10], the function will convert the null to uppercase and continue converting the next ten TCHARs as well.

I've seen programs crash because they thought that functions like CharUpperBuff and MultiByteToWideChar stopped when they encountered a null. For example, somebody might write

// buggy code - see discussion
void someFunction(char *pszFile)
{
 CharUpperBuff(pszFile, MAX_PATH);
 ... do something with pszFile ...
}

void Caller()
{
 char buffer[80];
 sprintf(buffer, "file%d", get_fileNumber());
 someFunction(buffer);
}

The intent here was for someFunction to convert the string to uppercase before operating on it, up to MAX_PATH characters' worth, but instead what happens is that the MAX_PATH characters starting at pszFile are converted, even though the actual buffer is shorter! As a result, MAX_PATH − 80 = 220 characters beyond the end of buffer are also converted. And since that's a stack buffer, those bytes are likely to include the return address. Result: Crash-o-rama. Things get even more interesting if the short buffer had been allocated on the heap instead. Then instead of corrupting your return address (which you would probably notice as soon as the function returned), you corrupt the heap, which typically doesn't manifest itself in a crash until long after the offending function has left the scene of the crime.

Critique, then, this replacement function:

//  buggy code - do not use
int invariant_strnicmp(char *s1, char *s2, size_t n)
{
 // [Update: 9:30am - typo fixed]
 return CompareStringA(LOCALE_INVARIANT, NORM_IGNORECASE,
                       s1, n, s2, n) - CSTR_EQUAL;
}

(Michael Kaplan has one answer different from the one I was looking for.)

Comments (26)
  1. Stuart says:

    The problem is that the length of one or both of the strings may not actually be ‘n’.

  2. Psa says:

    If the strings are equal but shorter than n, the result will depend on the junk past the end of the strings.

  3. Interestingly, the documentation for CompareString does not say explicitly what will happen for nulls in the strings if you pass a non-negative length value. I assume it will treat the string as a buffer that can contain nulls. Then, the code should be:

    return CompareStringA(LOCALE_INVARIANT, NORM_IGNORECASE,

    s1,

    min(n, strlen(s1)),

    s2,

    min(n, strlen(s2))) – CSTR_EQUAL;

    with

    int min(int a, int b)

    {

     return (a < b) ? a : b;

    }

  4. Tim Smith says:

    Don’t go blindly looking for the NUL character, that will cause crashes too.   (i.e. the strlen)

    Often, when you use methods that have a length parameter, you are dealing with strings that don’t have NUL termination (instead of using the length for substring extraction).  For example, if you have ever used EXPAT, many of the strings coming from the notification routines are pointing directly to the parse buffer and don’t contain the terminating NUL, thus the supplied length parameter.  I dealt with many a crashing program that assumed the strings were NUL terminated.

  5. Ray Trent says:

    Just curious, what does an uppercase NUL look like? :-)

  6. J. Edward Sanchez says:

    In ASCII, an uppercase NUL is represented as (NUL & ~0x20).

  7. Wang-Lo says:

    @Ray

    No that is the upper case.

    Lower case looks like this: nul

    Upper case looks like this: NUL

    Don’t blame yourself for being confused.  Failure to distinguish NUL from nul is the third most common programming fault after the fencepost error.

    -Wang-Lo;)

  8. michkap says:

    Is it cheating if I answer? I did resist for just about the whole morning. :-)

    One could always shove an Æ or an æ in one of the strings and watch the fireworks….

  9. Sohail says:

    [quote]

    // buggy code – see discussion

    void someFunction(char *pszFile)

    {

    CharUpperBuff(pszFile, MAX_PATH);

    … do something with pszFile …

    }

    [/quote]

    You’re kidding. No one who writes C code would ever write that.

    [Depends on your definition of “that”. Perhaps not literally “that” but I’ve seen this class of bug in commercial software. -Raymond]
  10. dave says:

    "You’re kidding. No one who writes C code would ever write that."

    Unfortunately, the world is oversupplied with programmers who regularly write code that no intelligent person in their right mind would write, and C has more than its share.

  11. Jonathan says:

    Well, with the content of the rest of the article seems to be a hint. And a (very brief) look at an MSDN article on CompareString* turned up the following interesting phrase.

    "Normally, for case-insensitive comparison, this function maps the lowercase "i" to the uppercase "I", "

    So my first guess would be that CompareStringA, when handed the NORM_IGNORECASE flag calls CharUpperBuff or similar function.**

    So the broken example function, if 1) asked to compare strings of non-equal length, or 2) given a length in excess of the actual string size, would probably happily perform the data corruption described in this article.

    *I keep looking to find the one on CompareStringA; so this might all be wrong

    **And there is probably someplace I should know about that explicitly states exactly how this flag is handled.

  12. steveg says:

    "You’re kidding. No one who writes C code would ever write that."

    If it can compile then it’s been done.

  13. Tim Smith says:

    NUL is an old shorthand for the ASCII character ‘’.  Many people use it instead of 0 or NULL to distinguish between the number zero, a "NULL pointer" and the ASCII character ‘’.

  14. Igor says:

    Actually the replacement function will work fine if n = -1, right?

  15. Igor says:

    And the strings are NUL terminated.

  16. @Tim Smith: "Don’t go blindly looking for the NUL character, that will cause crashes too.   (i.e. the strlen)"

    I think for a function called invariant_strnicmp with char* arguments, you can assume that these are valid null-terminated strings…

  17. mccoyn says:

    No one who writes C code would ever write that on purpose.

    The problem is that if your team cranks out 100,000 lines of code for a product can you be sure that none of your developers made that mistake just once, even though there are 1000 places where it could have been made?  Are your developers mistake rates less than 0.1% ?

    You might be able to double check everything, but it will take a lot of work.

  18. Adam says:

    If you want it to work with strings that may or may not be null terminated, I think you need something like this:

    int invariant_strnicmp(char *s1, char *s2, size_t n)

    {

    size_t n1 = strnlen(s1, n);

    size_t n2 = strnlen(s2, n);

    return CompareStringA(LOCALE_INVARIANT, NORM_IGNORECASE,

                          s1, n1, s2, n2) – CSTR_EQUAL;

    }

  19. Tim Smith says:

    Frederik: I think for a function called invariant_strnicmp with char* arguments, you can assume that these are valid null-terminated strings…

    Absolutely not.  The history of the strnicmp routines date back to the old unix file systems where you would pack those file named in a structure that has enough space to hold a string up to the max length NOT including the NUL.  

    For many, strncmp has become a method of doing substring matching, but it was originally more about space savings and buffer padding (see strncpy.)

  20. @Tim Smith: You’re right, I forgot about that. :-)

  21. Gazpacho says:

    "You’re kidding. No one who writes C code would ever write that."

    I think it is common knowledge among Windows programmers that the use of MAX_PATH anywhere in string-related code makes it crash-proof.

  22. IgorD says:

    This post brought tears to my eyes.

    Few weeks ago I found a bug that "sometimes" crashed my application.

    The code (simplified ) at first went something like this:

    void DrawingFunction (…, char *someText)

    {

      int len = strlen (someText);

      DrawText (hdc, someText, len, txtRect, DT_LEFT | …);

    }

    and everything was fine, but at some point I changed it to this:

    void DrawingFunction (…, char *someText)

    {

      int  len = strlen (someText);  // HERE’S THE DEVIL!

     char tmpBuff[256];

      if (somethin)  {

         len = GetAltText (tmpBuff, someText);

      else

         sprintf (tmpBuff, "%.*s", 255, someText);  // NORMAL CASE

      DrawText (hdc, tmpBuff, len, txtRect, DT_LEFT | …);

    }

    and it would crash every time ‘someText’ was longer than 255+ characters (and ‘somethin’ was false).

    I was chasing that bug for two years (because ‘someText’ was usualy very short). I don’t know, I must have done it in a hurry or asleap or …

    Horror! Horror!

  23. Dagwood says:

    This entire scenario and discusion supports my contention that when C was foisted on us by Scott McNeely, et al in the 80’s it was a very tragic day for IT.

  24. Raymond makes a good point in What happens if you pass a source length greater than the actual string

  25. John Hensley says:

    C wasn’t "foisted" on anyone, and certainly not by Sun. Programmers took it up willingly.

  26. AndyB says:

    Dagwood, C was foisted upon us by the folks at MS, not Scott McNealy. Scott foisted Java upon us, though I concur that Java and C are practically the same thing.

    I think I may have misplaced a # character in there somewhere though :-)

Comments are closed.

Skip to main content