Fancy use of exception handling in FormatMessage leads to repeated "discovery" of security flaw


Every so often, somebody "discovers" an alleged security vulnerability in the Format­Message function. You can try it yourself:

#include <windows.h>
#include <stdio.h>

char buf[2048];
char extralong[128*1024];

int __cdecl main(int argc, char **argv)
{
 memset(extralong, 'x', 128 * 1024 - 1);
 DWORD_PTR args[] = { (DWORD_PTR)extralong };
 FormatMessage(FORMAT_MESSAGE_FROM_STRING |
               FORMAT_MESSAGE_ARGUMENT_ARRAY, "%1", 0, 0,
               buf, 2048, (va_list*)args);
 return 0;
} 

If you run this program under the debugger and you tell it to break on all exceptions, then you will find that it breaks on an access violation trying to write to an invalid address.

eax=00060078 ebx=fffe0001 ecx=0006fa34 edx=00781000 esi=0006fa08 edi=01004330
eip=77f5b279 esp=0006f5ac ebp=0006fa1c iopl=0         nv up ei pl nz na pe cy
cs=001b  ss=0023  ds=0023  es=0023  fs=0038  gs=0000             efl=00010203
ntdll!fputwc+0x14:
77f5b279 668902           mov     [edx],ax              ds:0023:00781000=????

Did you just find a buffer overflow security vulnerability?

The FormatMessage function was part of the original Win32 interface, back in the days when you had lots of address space (two whole gigabytes) but not a lot of RAM (12 megabytes, or 16 if you were running Server). The implementation of FormatMessage reflects this historical reality by working hard to conserve RAM but not worrying too much about conserving address space. And it takes advantage of this fancy new structured exception handling feature.

The FormatMessage uses the reserve a bunch of address space but commit pages only as they are necessary pattern, illustrated in MSDN under the topic Reserving and Committing Memory. Except that the sample code on that page contains serious errors. For example, if the sample code encounters an exception other than STATUS_ACCESS_VIOLATION, it still "handles" it by doing nothing and returning EXCEPTION_EXECUTE_HANDLER. It fails to handle random access to the buffer or access violations caused by DEP. Though in the very specific sample, it mostly works since the protected region does only one thing, so there aren't many opportunities for the other types of exceptions to occur. (Though if you're really unlucky, you might get an STATUS_IN_PAGE_ERROR.) But enough complaining about that sample.

The FormatMessage function reserves 64KB of address space, commits the first page, and then calls an internal helper function whose job it is to generate the output, passing the start of the 64KB block of address space as the starting address and telling it to give up when it reaches 64KB. Something like this:

struct DEMANDBUFFER
{
  void *Base;
  SIZE_T Length;
};

int
PageFaultExceptionFilter(DEMANDBUFFER *Buffer,
                         EXCEPTION_RECORD ExceptionRecord)
{
  int Result;

  DWORD dwLastError = GetLastError();

  // The only exception we handle is a continuable read/write
  // access violation inside our demand-commit buffer.
  if (ExceptionRecord->ExceptionFlags & EXCEPTION_NONCONTINUABLE)
    Result = EXCEPTION_CONTINUE_SEARCH;

  else if (ExceptionRecord->ExceptionCode != EXCEPTION_ACCESS_VIOLATION)
    Result = EXCEPTION_CONTINUE_SEARCH;

  else if (ExceptionRecord->NumberParameters < 2)
    Result = EXCEPTION_CONTINUE_SEARCH;

  else if (ExceptionRecord->ExceptionInformation[0] &
      ~(EXCEPTION_READ_FAULT | EXCEPTION_WRITE_FAULT))
    Result = EXCEPTION_CONTINUE_SEARCH;

  else if (ExceptionRecord->ExceptionInformation[1] -
      (ULONG_PTR)Buffer->Base >= Buffer->Length)
    Result = EXCEPTION_CONTINUE_SEARCH;

  else {
    // If the memory is already committed, then committing memory won't help!
    // (The problem is something like writing to a read-only page.)
    void *ExceptionAddress = (void*)ExceptionInformation[1];
    MEMORY_BASIC_INFORMATION Information;
    if (VirtualQuery(ExceptionAddress, &Information,
                     sizeof(Information)) != sizeof(Information))
      Result = EXCEPTION_CONTINUE_SEARCH;

    else if (Information.State != MEM_RESERVE)
      Result = EXCEPTION_CONTINUE_SEARCH;

    // Okay, handle the exception by committing the page.
    // Exercise: What happens if the faulting memory access
    // spans two pages?
    else if (!VirtualAlloc(ExceptionAddress, 1, MEM_COMMIT, PAGE_READWRITE))
      Result = EXCEPTION_CONTINUE_SEARCH;

    // We successfully committed the memory - retry the operation
    else Result = EXCEPTION_CONTINUE_EXECUTION;
  }

  RestoreLastError(dwLastError);
  return Result;
}

DWORD FormatMessage(...)
{
  DWORD Result = 0;
  DWORD Error;
  DEMANDBUFFER Buffer;
  Error = InitializeDemandBuffer(&Buffer, FORMATMESSAGE_MAXIMUM_OUTPUT);
  if (Error == ERROR_SUCCESS) {
    __try {
     Error = FormatMessageIntoBuffer(&Result,
                                     Buffer.Base, Buffer.Length, ...);
    } __except (PageFaultExceptionFilter(&Buffer,
                   GetExceptionInformation()->ExceptionRecord)) {
     // never reached - we never handle the exception
    }
  }
  if (Error == ERROR_SUCCESS) {
   Error = CopyResultsOutOfBuffer(...);
  }
  DeleteDemandBuffer(&Buffer);
  if (Result == 0) {
    SetLastError(Error);
  }
  return Result;
}

The FormatMessageIntoBuffer function takes an output buffer and a buffer size, and it writes the result to the output buffer, stopping when the buffer is full. The DEMANDBUFFER structure and the PageFaultExceptionHandler work together to create the output buffer on demand as the FormatMessageIntoBuffer function does its work.

To make discussion easier, let's say that the FormatMessage function merely took printf-style arguments and supported only FORMAT_MESSAGE_FROM_STRING | FORMAT_MESSAGE_ALLOCATE_BUFFER.

DWORD FormatMessageFromStringPrintfAllocateBuffer(
    PWSTR *ResultBuffer,
    PCWSTR FormatString,
    ...)
{
  DWORD Result = 0;
  DWORD ResultString = NULL;
  DWORD Error;
  DEMANDBUFFER Buffer;
  va_list ap;
  va_start(ap, FormatString);
  Error = InitializeDemandBuffer(&Buffer, FORMATMESSAGE_MAXIMUM_OUTPUT);
  if (Error == ERROR_SUCCESS) {
    __try {
     SIZE_T MaxChars = Buffer.Length / sizeof(WCHAR);
     int i = _vsnwprintf((WCHAR*)Buffer.Base, MaxChars,
                         FormatString, ap);
     if (i < 0 || i >= MaxChars) Error = ERROR_MORE_DATA;
     else Result = i;
    } __except (PageFaultExceptionFilter(&Buffer,
                   GetExceptionInformation()->ExceptionRecord)) {
     // never reached - we never handle the exception
    }
  }
  if (Error == ERROR_SUCCESS) {
   // Exercise: Why don't we need to worry about integer overflow?
   DWORD BytesNeeded = sizeof(WCHAR) * (Result + 1);
   ResultString = (PWSTR)LocalAlloc(LMEM_FIXED, BytesNeeded);
   if (ResultBuffer) {
    // Exercise: Why CopyMemory and not StringCchCopy?
    CopyMemory(ResultString, Buffer.Base, BytesNeeded);
   } else Error = ERROR_NOT_ENOUGH_MEMORY;
  }
  DeleteDemandBuffer(&Buffer);
  if (Result == 0) {
    SetLastError(Error);
  }
  *ResultBuffer = ResultString;
  va_end(ap);
  return Result;
}

Let's run this function in our head to see what happens if somebody triggers the alleged buffer overflow by calling

PWSTR ResultString;
DWORD Result = FormatMessageFromStringPrintfAllocateBuffer(
                   &ResultString, L"%s", VeryLongString);

After setting up the demand buffer, we call _vsnwprintf to format the output into the demand buffer, but telling it not to go past the buffer's total length. The _vsnwprintf function parses the format string and sees that it needs to copy VeryLongString to the output buffer. Let's say that the DEMANDBUFFER was allocated at address 0x00780000 on a system with 4KB pages. At the start of the copy, the address space looks like this:

64KB
0078
0000

C
0078
1000

R
0078
2000

R
0078
3000

R
0078
4000

R
0078
5000

R
0078
6000

R
0078
7000

R
0078
8000

R
0078
9000

R
0078
A000

R
0078
B000

R
0078
C000

R
0078
D000

R
0078
E000

R
0078
F000

R
0079
0000

X
^ output pointer

"C" stands for a committed page, "R" stands for a reserved page, and "X" stands for a page that, if accessed, would be a buffer overflow. We start copying VeryLongString into the output buffer. After copying 2048 characters, we fill the first committed page; copying character 2049 raises a page fault exception.

64KB
0078
0000

C
0078
1000

R
0078
2000

R
0078
3000

R
0078
4000

R
0078
5000

R
0078
6000

R
0078
7000

R
0078
8000

R
0078
9000

R
0078
A000

R
0078
B000

R
0078
C000

R
0078
D000

R
0078
E000

R
0078
F000

R
0079
0000

X
^ output pointer

This is the point at which over-eager people observe the first-chance exception, capture the register dump above, and begin writing up their security vulnerability report, cackling with glee. (Observe that in the register dump, the address we are writing to is of the form 0x####1000.)

As with all first-chance exceptions, it goes down the exception chain. Our custom PageFaultExceptionFilter recognizes this as an access violation in a page that it is responsible for, and the page hasn't yet been committed, so it commits the page as read/write and resumes execution.

64KB
0078
0000

C
0078
1000

C
0078
2000

R
0078
3000

R
0078
4000

R
0078
5000

R
0078
6000

R
0078
7000

R
0078
8000

R
0078
9000

R
0078
A000

R
0078
B000

R
0078
C000

R
0078
D000

R
0078
E000

R
0078
F000

R
0079
0000

X
^ output pointer

Copying character 2049 now succeeds, as does the copying of characters 2050 through 4096. When we hit character 4097, the cycle repeats:

64KB
0078
0000

C
0078
1000

C
0078
2000

R
0078
3000

R
0078
4000

R
0078
5000

R
0078
6000

R
0078
7000

R
0078
8000

R
0078
9000

R
0078
A000

R
0078
B000

R
0078
C000

R
0078
D000

R
0078
E000

R
0078
F000

R
0079
0000

X
^ output pointer

Again, the first-chance exception is sent down the chain, our PageFaultExceptionFilter recognizes this as a page it is responsible for, and it commits the page and resumes execution.

64KB
0078
0000

C
0078
1000

C
0078
2000

C
0078
3000

R
0078
4000

R
0078
5000

R
0078
6000

R
0078
7000

R
0078
8000

R
0078
9000

R
0078
A000

R
0078
B000

R
0078
C000

R
0078
D000

R
0078
E000

R
0078
F000

R
0079
0000

X
^ output pointer

If you think about it, this is exactly what the memory manager does with memory that has been allocated but not yet accessed: The memory is not present, and the moment an application tries to access it, the not-present page fault is raised, the memory manager commits the page, and then execution resumes normally. It's memory-on-demand, which is one of the essential elements of virtual memory. What's going on with the DEMANDBUFFER is that we are simulating in user mode what the memory manager does in kernel mode. (The difference is that while the memory manager takes committed memory and makes it present on demand, the DEMANDBUFFER takes reserved address space and commits it on demand.)

The cycle repeats 13 more times, and then we reach another interesting part of the scenario:

64KB
0078
0000

C
0078
1000

C
0078
2000

C
0078
3000

C
0078
4000

C
0078
5000

C
0078
6000

C
0078
7000

C
0078
8000

C
0078
9000

C
0078
A000

C
0078
B000

C
0078
C000

C
0078
D000

C
0078
E000

C
0078
F000

C
0079
0000

X
output pointer ^

We are about to write 32768th character into the DEMANDBUFFER. Once that's done, the buffer will be completely full. One more byte and we will overflow the buffer. (Not even a wafer-thin byte will fit.)

Let's write that last character and cover our ears in anticipation.

64KB
0078
0000

C
0078
1000

C
0078
2000

C
0078
3000

C
0078
4000

C
0078
5000

C
0078
6000

C
0078
7000

C
0078
8000

C
0078
9000

C
0078
A000

C
0078
B000

C
0078
C000

C
0078
D000

C
0078
E000

C
0078
F000

C
0079
0000

X
output pointer ^

Oh noes! Completely full! Run for cover!

But wait. We passed a buffer size to the _vsnwprintf function, remember? We already told it never to write more than 32768 characters. As it's about to write character 32769, it realizes, "Wait a second, this would overflow the buffer I was given. I'll return a failure code instead."

The feared write of the 32769th character never takes place. We never write to the "X" page. Instead, the _vnswprintf call returns that the buffer was not large enough, which is converted into ERROR_MORE_DATA and returned to the caller.

If you follow through the entire story, you see that everything worked as it was supposed to and no overflow took place. The _vnswprintf function ran up to the brink of disaster but stopped before taking that last step. This is hardly anything surprising; it happens whenever the _vnswprintf function encounters a buffer too small to hold the output. The only difference is that along the way, we saw a few first-chance exceptions, exceptions that had nothing to do with avoiding the buffer overflow in the first place. They were just part of FormatMessage's fancy buffer management.

It so happens that in Windows Vista, the fancy buffer management technique was abandoned, and the code just allocates 64KB of memory up front and doesn't try any fancy commit-on-demand games. Computer memory has become plentiful enough that a momentary allocation of 64KB has less of an impact than it did twenty years ago, and performance measurements showed that the new "Stop trying to be so clever" technique was now about 80 times faster than the "gotta scrimp and save every last byte of memory" technique.

The change had more than just a performance effect. It also removed the first-chance exception from FormatMessage, which means that it no longer does that thing which everybody mistakes for a security vulnerability. The good news is that nobody reports this as a vulnerability in Windows Vista any more. The bad news is that people still report it as a vulnerability in Windows XP, and each time this issue comes up, somebody (possibly me) has to sit down and reverify that the previous analysis is still correct, in the specific scenario being reported, because who knows, maybe this time they really did find a problem.

Comments (24)
  1. Anonymous says:

    Wow, thanks for the writeup and the peering into the internals.  A factor of 80 improvement?  Is that averaged over some large corpus of error strings and use cases, both long and short?  In any case, that's impressive — was the previous version of FormatMessage really spending 98.75% of its time just dispatching access violations and VirtualQuery/VirtualAlloc?

    The documentation on FormatMessage is a little confusing regarding the arguments array.  In re the FORMAT_MESSAGE_ARGUMENT_ARRAY flag, it says that it can't be used with 64-bit integer values, yet the description of the Arguments parameter says that you need to pass a pointer to an array of DWORD_PTR values when you use that flag.  On 64-bit Windows, a DWORD_PTR is a 64-bit integer, so these statements seem contradictory.

    [No contradiction. You can't use %1!I64d! in a format. The DWORD_PTR is there so you can pass a string pointer to %1. (Think about the problems that would arise if 64-bit and 32-bit Windows packed 64-bit integers differently.) -Raymond]
  2. Anonymous says:

    Adam: Raymond likes to throw out "example numbers" which aren't based on actual data. (E.g. he usually refers to a program in question as "Program Q" or similar). That said, the page buffer reallocate method is extremely slow because of the context switch into the Kernel to satisfy the memory commit request, so I'd not be surprised if between that and other machinery to do this that it'd be 80 times faster in some cases.

    [In this case, the number 80 came from actual measurements. I did not conduct the measurements so I don't know what the methodology was. -Raymond]
  3. Joshua Ganes says:

    I understand that somebody has to look at incoming vulnerability reports, even if they are absurd. Once you recognize that the report is describing the same issue that was reported dozens of times before, how much time and effort do you actually spend on the analysis?

    [You have to verify that the previous analysis is still correct. Who knows, maybe a subsequent change to the function introduced a security issue. -Raymond]
  4. Anonymous says:

    I find a 64KB allocation surprising.  How many formatted messages fill more than even a single 4KB page?  I guess (1) 64KB is the undocumented minimum virtual address space reservation, and (2) using the process heap is out of the question?  I might have considered committing just the first 4KB page of the reservation, and then committing the rest in one fell swoop if the first-chance exception is hit.  Then again, it sounds like avoiding the first-chance exception is a feature too, and RAM is cheap as free these days anyhow.

  5. jader3rd says:

    It's going to pain some people to read that in some ways Vista is faster than it's predecessors.

  6. Anonymous says:

    @Jeremy Stanley, regarding "I might have considered committing just the first 4KB page of the reservation, and then committing the rest in one fell swoop if the first-chance exception is hit," that doesn't go well with the stated goal of using as little memory as possible.  Allocating 60K when you are likely to only need 4 or 8 is wasteful (considering the timeframe.)

  7. Anonymous says:

    @Rick C, you misunderstand; I was suggesting this as an alternative to the newer Vista behavior, which was commit 64KB for every call.  The idea was to keep (presumably) common < 4KB cases efficient, but then handle big ones with just one "page fault" instead of fifteen.

  8. Anonymous says:

    Here's a (possibly naive) follow-up question.  Why allocate a fixed buffer at all?  Why not run through the formatting once to count the number of characters, allocate the exact amount of space needed, and then run through a second time to actually fill the buffer?

    [See: TOCTTOU. -Raymond]
  9. Anonymous says:

    "[See: TOCTTOU. -Raymond]"

    And?  Somebody's going to change the parameters and cause the program to crash?  Oh noes!  Also, the entire registry API (blah blah time machine blah blah).

    But yeah, 64KB is a reasonable size to allocate up front; simpler than the alternatives.

    [You're lucky if it crashes. If you're unlucky, you have a remotely exploitable buffer overflow. -Raymond]
  10. Anonymous says:

    You still know the size of the buffer ahead of time so you shouldn't be in danger of an overflow.

  11. Anonymous says:

    Why not update FormatMessage in XP to work the same as Vista to make those reports go away?  Seems like a no brainer, especially if the machine has a GB or more.  Keep the old method around for the smaller machines.

  12. Anonymous says:

    Ian D.: I'm pretty sure the cost of regression-testing and distributing a new Kernel32 binary vastly outweighs the lost engineering time from the occasional "security vulnerability."

  13. Oh, but sir… it's only wafer-thin…

  14. Anonymous says:

    @Ian:

    Well, the reasoning for the cludge did not vanish, so how could you justify removing the cludge?

    You cannot suddenly say 'Oh, and by the way, you must dump all your obsolete business-critical xp machines or live without security fixes.'

    There's going to be someone suing you for sure, besides the nice advertisement.

  15. Anonymous says:

    The "commit on demand" technique doesn't actually save any physical memory, because physical pages are allocated on demand anyway (at the time of first access). The only thing it saves in the FormatMessage example is a transient, extremely short-lived 60K spike in commit, which is for all practical purposes free, even on a 12 MB machine from 15 years ago.

  16. Anonymous says:

    "[See: TOCTTOU. -Raymond]"

    I don't follow.  The concern is that the parameter values might change between your allocation and writing the string? That seems like an unusual edge case to me.  Wouldn't be ok to just fail in that case? The output is unlikely to be consistent anyway.

    [News flash: Unusual edge cases are where the security vulnerabilities tend to hang out. And sure you could decide to fail that case – but you now have to write code to detect that you are in that case. It's not a slam-dunk "Just format it twice." -Raymond]
  17. Anonymous says:

    What is this kind of stupid architectural limitations? On a 32-bit system, strings should be able to be 2^32-1 bytes long. On a 64-bit system, strings should be able to be 2^64-1 bytes long. And NO slack memory should be allocated, not a single byte. It should be 100% secure too. Please go back to the drawing board and learn how to program.

  18. Anonymous says:

    @640K: I had a 32-bit system with a string of 2^32-1 bytes once. I would have made a truly marvellous screenshot as proof, but unfortunately, memory was too marginal to contain it.

  19. Anonymous says:

    I was wondering what these first-chance exceptions in format message were that was filling up my debugger logs. They didn't seem to be doing any damage to my app, so I ignored them, but I always wondered. Thanks for explaining.

  20. Anonymous says:

    It would be interesting to here some rebuttal/comment on Pavel's claim that this was a symptom of over-optimization, where the underlying systems are not understood. Kind of like that story about optimizing based on profiling and end up making the idle loop x % faster…

  21. Anonymous says:

    Anyway, the "measure, allocate, fill" pattern is common, but it only works when you can guarantee that nothing's going to be changed in another thread in the middle of the process. To be robust in that situation (which you have to be, if you're packaging up the whole thing in a library function), you'd need to do it in a loop, or add an extra check. Some unix libc's strdup functions get this wrong (they use memcpy, which means they can silently truncate and not null-terminate a string in this situation – MSVCRT _strdup uses strcpy_s, which at least does a check and crashes.)

  22. @Pierre: Not impossible – for that matter, of course, there's nothing to say a 32 bit system should be constrained to 2^32 bytes: you can do 64 or even 128 bit arithmetic. The same techniques were much more useful back in 16 bit days, of course, where limiting resources to 64 kilobytes would be a real problem: not much market for a word processor which dies on a 64k file, or an image editor which can't load a 640×480 bitmap – but Windows added the Address Windowing Extensions, so 32 bit applications could access more RAM than would fit in a 32 bit address space.

    The 80-fold speedup from removing this "efficiency" technique reminded me of working on an old VB application years ago. It performed a fairly simple calculation on a batch of numbers, displaying a progress indicator which slowly crept across the screen. Tasked with speeding it up, I profiled the code … 99% of the CPU time was going on drawing the progress bar itself! Remove that, the remaining code was fast enough there was no need for any progress indicator anyway…

  23. Anonymous says:

    @jas88: you forgot the simplest way of all to cheat — if the string is a C string, all we require is that the final byte in memory is a NUL. If that's the case, then all of memory can be interpreted as a string. I had such a string once and making a screenshot was fairly easy because it contained bytes that decoded to instructions for doing so…

  24. dbacher says:

    Looks like a pretty smart way of doing it really (don't worry, I'll go back to being negative soon enough).

    I'd imagine if you put a metric on the number of calls that return without the first chance exception being raised — especially when the decision was made — it would probably be above 90%.  In C/C++, generally programmers favor the built-ins, and the built-ins (printf family, etc.) don't generally route through FormatMessage because it would be a pain to translate the one set of semantics into the other.  So for the 90% case where FormatMessage is being used to recover a message from the OS for an error code, and then invoked a second time to format the message for MessageBox, the 4k buffer would be totally adequate.

    Also, if you were smart, you keep the buffer around because if the thread needed 20k on one call, chances are it might make that call again.  I'd think that would be a case you'd probably optimize for (for the case of FormatMessage going to a log file — which would be the most common case where I'd expect to see > 4k messages, potentially in an inner loop).

    In terms of counting — that seems like it would be really hard to do without two separate implementations of FormatMessage.  Traditionally, one version of FormatMessageA/FormatMessageW called through to the other, didn't it?  So lets say that I'm on NT so FormatMessageA calls FormatMessageW, then does a Unicode to MBCS conversion.  How do I count that for shift-JIS (and if you want to bring modern Windows into the mix, assume that there are surrogates in the unicode string).

    Also, for each character position you have a switch and either an accumulate or a produce (at least, that's how I'd expect it works inside).  If you assume that, then you've got this huge ass switch statement and accumulate/reduce logic that you've got to repeat.  And so you're really — during the counting pass — going to probably tokenize into a series of instructions, if you were going to do it that way.  At that point, you're back to needing a variable sized buffer — but now you need two of them.

Comments are closed.