What's wrong with this code, part 11 - the answers

Well, clearly the shorter the example, the quicker people pick up the problem.

The problem was:

    if (!CreateProcessW(NULL, PROCESS_NAME, NULL, NULL, FALSE, 0, NULL, NULL, &startupInfo, &processInformation))

The issue here is that you can't pass a constant string to CreateProcessW.  The documentation for CreateProcessW states:

The Unicode version of this function, CreateProcessW, will fail if this parameter is a const string.

But it doesn't explain why it fails.  Skywing pointed out what's going on - the developer who wrote CreateProcessW temporarily modified the input string to separate the executable name from the arguments.  There are two issues here.

The first is the obvious one: You can't use constant strings.  And there's a second one: You can't call CreateProcess with the same buffer on multiple threads.  While the call to CreateProcess restores the contents of the lpCommandLine string before it returns, there is still a period of time when the contents of lpCommandLine are not what was passed in originally.

So if you have two threads calling CreateFile with the same buffer, some of the calls to CreateProcess will have no command line arguments, and some will have the expected command line arguments (Hmmm.  Maybe I should have used this as the example instead of the one I did - to show a concurrency issue in the base API set.  Oh well...).

As I said, I got burned by this one, the reason for this is that the current compiler puts constant strings in a read-only data section.  And thus the PROCESS_NAME variable is read-only, and it causes the crash.

During the comments, I asked "But why did I say that it was possible that you won't see the bug?".  Well, it's because the bug depends on the compiler switches.  The default switches exhibit the bug.  But if you were using an older compiler (circa vc6, if I recall correctly) or if your compiler switches were set differently, the behavior would be different.

The relevent switch is the /Gf switch.  The /GF switch enables a compiler feature called string pooling - essentially constant strings are collapsed together if they're the same string.  If your code was compiled with the /Gf switch (enable string pooling as read/write strings), then the constant strings used for PROCESS_NAME would not be kept in a read-only section.

The reason that you see this the bug with the default compilation options is that the /ZI command line switch (which enables symbolic debugging in a PDB file) forces the /GF switch to be enabled.

The history of string pooling in NT is long and varied.  When NT 3.1 was under development, the person responsible for the builds changed the state of the /Gf flag several times during the lifetime of the project - there was code in the system that had depended on constant strings being read/write, and other code that depended on the lack of string pooling (you'd have two strings with the same text and modify one of the two strings).  I believe that NT 3.1 eventually shipped without the string pooling, and that it was added after we shipped (probably in the NT 3.51 timeframe, when we were concentrating on performance).

 

Kudos:

ParadoxLost nailed the base problem in the very first comment.  Mike Dunn nailed the constant string issue, and Purplet nailed the /Gf /GS switch issue.

Mea culpas:

Joel Spolsky and eruprahgp pointed out the possibility of a handle leak with the handles returned in the processInformation structure.  That's a fair criticism, but since the process exits immediately afterwards, the handles will be closed.  But in production systems this is a real issue.

gregg, Bradly Grainger, and Universalis pointed out that status is only assigned if CreateProcess fails -it's uninitialized otherwise.

Setfan Kuhr totally nailed a problem I'd not seen: The startupinfo doesn't have the wShowWindow and dwFlags fields - so notepad might not be visible when it's launched (I don't know if this is a problem or not - the process I was launching was a console process so it was irrelevant).

Major kudos to brantgurga for thinking to run this silly little program under appverifier.  That pointed out a couple of other issues: The code shouldn't have lpApplicationName as null and lpCommandLine as non null because of the spaces in file names issues.

And Adrian totally blew me away when he pointed out that on versions of NT before NT4 SP5, the GetFileAttributes API did not have a stable return value.  Wow, that's cool, I'd never known that.