If you are trying to understand an error, you may want to look up the error code to see what it means instead of just shrugging


A customer had a debug trace log and needed some help interpreting it. The trace log was generated by an operating system component, but the details aren’t important to the story.

I’ve attached the log file. I think the following may be part of the problem.

[07/17/2005:18:31:19] Creating process D:\Foo\bar\blaz.exe
[07/17/2005:18:31:19] CreateProcess failed with error 2

Any ideas?

Thanks,
Bob Smith
Senior Test Engineer
Tailspin Toys

What struck me is that Bob is proud of the fact that he’s a Senior Test Engineer, perhaps because it makes him think that we will take him more seriously because he has some awesome title.

But apparently a Senior Test Engineer doesn’t know what error 2 is. There are some error codes that you end up committing to memory because you run into them over and over. Error 32 is ERROR_SHARING_VIOLATION, error 3 is ERROR_PATH_NOT_FOUND, and in this case, error 2 is ERROR_FILE_NOT_FOUND.

And even if Bob didn’t have error 2 memorized, he should have known to look it up.

Error 2 is ERROR_FILE_NOT_FOUND. Does the file D:\Foo\bar\blaz.exe exist?

No, it doesn’t.

-Bob

Bob seems to have shut off his brain and decided to treat troubleshooting not as a collaborative effort but rather as a game of Twenty Questions in which the person with the problem volunteers as little information as possible in order to make things more challenging. I had to give Bob a nudge.

Can you think of a reason why the system would be looking at D:\Foo\bar\blaz.exe? Where did you expect it to be looking for blaz.exe?

This managed to wake Bob out of his stupor, and the investigation continued. (And no, I don’t remember what the final resolution was. I didn’t realize I would have to remember the fine details of this support incident three years later.)

Comments (64)
  1. Paul says:

    Surely you need a compatibility shim to make d:foobarblaz.exe appear to exist?

  2. Not Raymond says:

    Remember when we were talking about how people don’t look up error codes? That was cool.

  3. Michael says:

    “I didn’t realize I would have to remember the fine details of this support incident three years later.”

    My goodness. Relax, Raymond, you’re among (mostly) friends.

    [I know. But it’s the 10% non-friends that cause the trouble. -Raymond]
  4. Joseph Koss says:

    Maybe its time that all log viewers put a link next to all errors codes that will open a browser and perform a google/bing/msdn search for said error code.

    Just saying…

  5. Billy O'Neal says:

    @ Joseph Koss

    Maybe you’re assuming the log isn’t dumped to a plain text file….

  6. kog999 says:

    @Joseph Koss

    Or the error log could hae just said Error 2:FILE_NOT_FOUND

  7. fahadsadah says:

    The error recorded is the return value.

    Would you rather have a function that returns 2, or one that returns "Error 2:FILE_NOT_FOUND"?

  8. Eric says:

    So what’s the story behind the malware-serving Tailspin Toys site (as referenced in Bob Smith’s signature)?

    [Look at the big red text at the top. -Raymond]
  9. JS says:

    tip:  you can translate error message on the command line with net helpmsg, e.g.

    C:>net helpmsg 2

    The system cannot find the file specified.

  10. Peter says:

    Don’t you think you’re being a little unfair, Raymond?  Where is a non-programmer supposed to look up these error codes anyway?  Where I work, our logs files are intended for programmer consumption, so we don’t get bent out of shape when QA or customers ask us to interpret them.

    [The person asking the question was a technical person. I expect technical people to know their error codes (or at least know where to look them up). And even if he didn’t recognize the error code, once its meaning was explained, he should have been able to say something like “Oh, yeah, I uninstalled Foo last week, that may be the problem” instead of an unhelpful “No the file isn’t there.” -Raymond]
  11. Jim says:

    The story is not new, everybody has a mind of being taken care of or babysitting by someone alse. The atitude is annoying but this is the trend in the physch of the people, the higher the title the worse the symptom.

  12. Kujo says:

    I’ve encountered this behavior from testers before, and some of these are smart people.  I wonder what it is about tester environments that tends to foster such helplessness and lack of initiative.

  13. Someone You Know says:

    @Jim

    This does seem to be very common, especially in the United States.

    But Raymond’s always talking about how in the past, it was assumed that programmers knew what the hell they’re doing. Presumably, enough of them actually did that such an assumption was useful. Now it seems to have gone the other way, at least on the programmers’ side. When did this transition happen? And why?

  14. Marquess says:

    At least .Net learned to throw exceptions, which are much better at being converted to English. Although the explanation for FileNotFoundException (“The PATH environment variable has a string containing quotes”) is … creative. That’s totally gotta help.

  15. Peter says:

    Well, our QA team consists of "technical" people, but I still wouldn’t expect them to know what Win32 error codes mean.  (Or even that CreateProcess is a Win32 API call for that matter.)  I would take Bob’s first answer to mean "Blaz.exe is not there.  I have no idea what it is.  Do you expect it to be there?"  Anyway, I have no comment on Bob’s competence (or lack there-of).  I just think a little sympathy is in order. ;)

  16. Brian Morton says:

    Those are some very nice images!  Where did they come from?  They don’t seem to contain any EXIF data.

  17. Igor says:

    I assume tailspin tracks number of downloads to see if people read the message. Posting a link here will completely screw up their numbers :) All the geeks will want to know what’s in the download.

  18. Bryan says:

    @Peter: It might depend some on your tech support level, but I would be frustrated by your QA/Support team. While they don’t need to know the first thing about what a "Win32 API" really is, I would definitely expect them to know how to look up generic error codes. Even if they don’t know about net helpmsg x just having the error lookup tool would suffice.

  19. pbrown says:

    | C:>net helpmsg 2

    | The system cannot find the file specified.

    Wow!  "net helpmsg" is broken, too!  Can’t Windows do anything right?  :-)

  20. Anonymous says:

    Of course, even if it had been a friendly error string, I’ve frequently had the experience that people can’t be bothered to read those.

    For example, in this case, let’s say the message had said, "The file was not found.  Please verify that D:foobarblaz.exe exists, or suitably adjust your input."  With a lot of people, chances are they’ll still expect to discover what they did wrong and introduce a code change to accomodate their circumstance.

  21. John says:

    fprintf(log, “error %d: %sn”, dwResult, GetFormattedErrorMessage(dwResult));

    [Ooh, a memory leak (nobody freed the formatted error message), or maybe a thread-safety bug (static buffer), or additional TLS data management (the formatted error is a per-thread variable) and then a memory leak when the DLL is dynamically unloaded. -Raymond]
  22. Orisit says:

    "tip:  you can translate error message on the command line with net helpmsg, e.g.

    C:>net helpmsg 2

    The system cannot find the file specified."

    Actually, that is an error message from the NET program, because it can not find the file "helpmsg".

    Or, is it???

  23. Mark T. Tomczak says:

    While I agree that people introspecting logs are technical people and can deal with an error code, my irritation at the answer "People should know how to look up error codes" comes from two places:

    1) Technical people are busy people, and converting an integer code to a string is a task the computer can easily do for the developer. Yes, it’s more code to do it properly; that’s why I usually recommend to people who are starting on big projects that they start with the error management system.

    2) Is the code a POSIX code? Win32 code? A custom number coming from some third-party module? Error code numbers, if not cautiously handled, can easily cross namespaces and become useless or, worse, misleading. It’s harder for a descriptive string to suffer such semantic mismatches.

    The second issue doesn’t apply in Bob’s case (though the first issue does… Even knowing CreateProcess is a win32 function, I had to page through four or so bits of MSDN’s documentation to find the "System Error Codes" page). But in general, a lot of headaches can be saved by having the error logging system provide a user-readable string. Execution logs are usually for humans to consume, and humans (even software engineers) don’t think in number codes.

    That having been said, the overarching point of "Don’t shut off your brain" is a good point. But after spending months wrestling with a program designed with the philosophy that such terse error reporting is acceptable, a person can get tired and snippy when they ask for help.

  24. patros says:

    "Actually, that is an error message from the NET program, because it can not find the file "helpmsg".

    Or, is it???"

    Try "net helpmsg 7". Not the same message, mystery solved.

  25. ivanjh says:

    I think it’s from an IE8 demo of the "SmartScreen Filter"

    http://www.ie8demos.com/smartscreen/tailspin/

  26. John says:

    GetFormattedErrorMessage() returns a CString.

    [Try explaining to the architecture team why your system service has a dependency on MFC. (Oh, and now you have to convert it from C to C++.) -Raymond]
  27. Mr Willows says:

    The system won’t find my program (myprogram.exe).  I’ve not yet compiled (or actually written) the program yet, but can you tell me why Windows is not working correctly.

  28. John says:

    Well, gosh.  I guess if it can’t be done in 5 lines of code or less then it isn’t worth doing.  The point is that it’s not so much extra work that it’s not worth doing, though I suspect most of us don’t bother.

    char msg[512];

    fprintf(log, “error %d: %sn”, dwResult, GetFormattedErrorMessage(dwResult, msg, 512));

    [Most people wouldn’t bother. It’s just a log file. The information is still there; it’s just terse. -Raymond]
  29. porter says:

    Problem: Tester raises bug, "Log file too terse"

    Solution: Developer removes log file capability because it was never an agreed requirement.

    See, everybody happy.

  30. Olivier says:

    Why do we get crypted error like : “[07/17/2005:18:31:19] CreateProcess failed with error 2″ instead of something clear like : “there was an error with CreateProcess, stupid you, please check if the file D:Foobarblaz.exe exists.”

    @Eric : in the malware files, there are just random pr0n pics (hehe, now many people will download it ;) ).

    @Jim : “the higher the title the worse the symptom”, yes, because stupid people can’t do real work, so they get high titles to feel important. The problems start when those people want to do something…

    [This is a log file, not an error message to be displayed to the end user. -Raymond]
  31. Sean says:

    And that is why we use strerror or equivalent

    [This is a log file, not a message in a dialog box shown to the user. Log files assume that the reader is a technical person. Given the choice between writing fprintf(log, “error %dn”), GetLastError()) /* look it up in winerror.h if you don’t recognize it */ and DWORD dwError = GetLastError(); LPTSTR pszError; DWORD dwResult = FormatMessage(FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_ALLOCATE_BUFFER, NULL, dwError, 0, (LPTSTR)&pszError, 0, NULL); fprintf(log, “error %d: %sn”, dwResult ? pszError : “(error retrieving error text)”); if (dwError) LocalFree(pszError); which one would you write in a quick logging function? -Raymond]
  32. Cheong says:

    While being completely brainless when asking for help is annoying, I found that being too proactively suggesting "maybe" cause at your side when asking for help to be yet another kind of annoying manner.

  33. Cheong says:

    [This is a log file, not an error message to be displayed to the end user. -Raymond]

    Apparently there’s some steadly users exist on forums/newsgroup that, after a few tries to ask for logs when asking questions about their problem, supply the logs by default when they ask questions. Eventally (or will it?) they’ll find out for most of the time, the most related information about their problem is on the last few lines of the log.

    You can’t just assume they know something when they refer to something in the log file.

  34. Ciggy says:

    So that’s what happened to Microsoft Bob… He went on to write code. :)

  35. porter says:

    While writing code I got in the habit of setting HRESULT to E_UNEXPECTED until I truely had some status to return.

    MSVC 4 says it’s "Unexpected failure", alas some wag has now changed it in XP to "Catastrophic failure", which made for some interesting responses from testers!

  36. Gabe says:

    I don’t understand why the Senior Test Engineer asked you [MS] instead of some developer who wrote the code that generated the error

    [“The trace log was generated by an operating system component…” The developer who wrote the code that generated the error worked at MS. -Raymond]
  37. dave says:

    >The PATH environment variable has a string containing quotes.

    By crikey, that *is* my problem!

    C:> path

    PATH=C:Windowssystem32;http://www.bartleby.com/100

  38. CGomez says:

    The crap I deal with is:

    "We got this error in our logs… it’s (rattles off some HRESULT).  We looked it up and it says, ‘blah’, but you know MSFT error codes aren’t really that helpful.  Half the time that’s not the real problem."

    1) Why is an HRESULT a "MSFT error"?  Are they they only ones who write and install COM components?

    2) Not the real problem?

    I deal with this all the time… debugging by voodoo. It’s out there and it’s common.

  39. Gaspar says:

    @pbrown

    That was my gut reaction, followed a second latter by, "oh duh" *facepalm*  =P

    ——

    I am pretty amazed by the responses in the comments on this one.  This sounds like a classic case of an incompetent person out of his league or maybe just someone having a REALLY bad day (we’ve all had those).

    First he seems to have failed in looking up an error code.  This should be step number one.  Next when told the message is file-not-found, he scratches his head.  I can’t think of an error that is more self explanatory.

    Maybe out-of-batter…. *blank screen*

  40. Pax says:

    Can I suggest one thing? When you write messages to your log files, don’t use the word "error". We had one customer ringing up complaining that they were getting a lot of error #0 messages in the log files and wanting to know what the errors were. They wouldn’t listen when we said the log file was for us, not them and that, as long as the software was working, they shouldn’t even look in there. Solution: "s/error/status/g" in the next patch and they stopped complaining.

  41. Drak says:

    I love all these people here who say ‘The tested doesn’t know it’s win32 API’ and ‘The tester doesn’t need to know error codes’.

    I hope your testers are smart enough to use google? Google for ‘createprocess error 2′ and you’ll find all kind of nice hits. Et voila in the first one someone is asking ‘What is error 2′ and another poster explains ‘It’s FILE_NOT_FOUND’. Is that too much to expect from a SENIOR tester? I hope not.

    I admit the internet is probably fuller of forums today than it was a couple of years ago, but even back then ‘googling it’ was always step 1.

  42. ChrisR says:

    As someone else mentions, logging and such are an integral part of what we do when we start a new project, if it has to be started from scratch.  Otherwise it’s already there in the infrastructure.

    Then it’s almost hard to *not* bother with writing out the error string.

    LogWin32Error( "Something bad happened" )

    int LogWin32Error( char * message, DWORD code )

    {

     // GetWin32ErrorMessage can return CString or std::string or fill in a char*, as needed by project.  Make sure to fix the printf if GetWin32ErrorMessage returns an object though, since it won’t call any conversion operators (or c_str) for you.

     printf( "%s: %d (%s)", message, code,  GetWin32ErrorString( code ) );

    }

    But Raymond, I do agree that most people won’t bother and in a lot of other projects I don’t bother either.

  43. Ooh says:

    Goran, MFC CString was explicitly designed to support exactly that scenario – CString should be a convenient wrapper that you can still pass in to all those other C-style functions without having to worry about it. So yes, you can use the explicit cast, but there’s absolutely no need to.

  44. Imaperson says:

    IME there’s an issue with providing verbose error strings to a user (granted in this case we’re not talking about a user). More often than not they will provide their "interpretation" of the message rather than the message verbatim, e.g. "It said something about XYZ", whereas if you give them a number they will provide *the number* which can then be (assuming they’re sanely assigned) quickly looked up in code/documentation.

  45. anon says:

    Who the hell "memorizes" these error codes? You look them up when you need to, and only because the goddamn programmer decided to show the "error code" instead of text of the error message. Programmers suck.

  46. Neil says:

    @JS

    net helpmsg doesn’t work for messages with insertions, instead it just spits out "%d is not a valid Windows network message number."

    I’m not sure how to tell whether that message has a message number, since it has an insertion…

    @Marquess

    PATH=Proverbs12:24

    Oh wait, not that sort of quote…

  47. Wound says:

    @dave Nicely played sir!

  48. Z says:

    i am willing to bet that even if the log file had said "file not found" in plain text, Bob would have bugged you anyway!

  49. Teo says:

    Hello, the log file is the ui for its end-user, which is the QA guy/gal :-)

    CString is Atl since idk, maybe 2003? Surely Atl is in almost 95% of Win32 code, usually because it’s a very convinient com wrapper, and basically every win32 program larger than 10 lines of code needs com.

    Raymond, as with most MS technologies FormatMessage looks like it’s convenient and easy to use while in practise fails miserably.

    So your example that is implicitly says “gee, see how easy is to use FormatMessage” is flawed.

    [I thought my example was saying “Look how cumbersome it is to use FormatMessage.” Oh, and according to Windows Internals, “The vast majority of Windows is written in C.” You’d be surprised how much of Windows doesn’t use COM. (For example, um, everything written at a lower level than COM.) -Raymond]
  50. mmyers says:

    @dave: Please don’t make me laugh out loud like that in the middle of the office!

  51. Goran says:

    @John (mildly useful nitpick):

    CString GetFormattedErrorMessage(DWORD dwResult);

    fprintf(log, "error %d: %sn", dwResult, GetFormattedErrorMessage(dwResult));

    A smart guy pointed out to me that this works by accident. It probably always will, because MFC team can’t now change implicit assumption that makes this possible, but still.

    You see, %s in fprintf thinks it has a (probably const) char* as a second parameter (that’s actually TCHAR* in MSVCRT, which is I think not standards-compliant, but it’s better that it isn’t).

    But you are passing CString. You may think that you are passing LPCTSTR, probably because CString has (effectively) operator LPCTSTR(). But in fact, you are just passing a "this" of a CString (look at dissasembly for proof ;-)). Because, you see, compiler doesn’t know that second parameter should be TCHAR* and that it should look for it. So it passes what it has, a CString.

    And the accident is that "this" in CString points to the first character of the underlying string (but actual CString data is CStringData*).

    I guess that was a neat optimization that makes CString::operator PCXSTR (used to be operator LPCTSTR, and effectively is still that) as fast as possible.

    So… Your code works, but it does not hurt to be wary of the above and always write e.g.

    fprintf(log, "error %d: %sn", dwResult, LPCTSTR(GetFormattedErrorMessage(dwResult)));

    (note the explicit cast to LPCTSTR).

  52. njkayaker says:

    [Most people wouldn’t bother. It’s just a log file. The information is still there; it’s just terse. -Raymond]

    It’s easy to write a function to report the GetFormattedErrorMessage() result. And it’s (relatively) easy to use the function. That is, it isn’t much of a bother. While Bob shouldn’t be helpless, the programmer should not be so lazy.

    Being less terse in things like logs increases (ever so slightly!) the likelihood that the problem will be understood earlier.

    [Perhaps so, but it didn’t help in Bob’s case… -Raymond]
  53. Random User 43791 says:

    Pros and cons of logging/showing error codes aside – if you are going to use error codes, be careful how you show them. Make sure the context strongly implies or explicitly states it is an error code, lest users interpret the code "creatively".

    This is particularly true if you are in the US, and use error codes that end up being represented by 10 decimal digits. On more than one occasion, this has resulted in the user calling support to report that, "I called the phone number in the error, but it was [disconnected/unhelpful/embarrassing/etc.]."

    Note that this effect may apply to any sequence of digits that a user might interpret as a phone number, even if it means ignoring a leading hyphen or using dial-by-letter. And by any means, avoid preceding the code with a word along the lines of "call", such as in "…result of procedure call: -2146827859 [ActiveX component can’t create object]".

  54. Boris says:

    MSFT error messages are often wrong especially when the error is a result of a bug. I’ve seen "the handle is invalid" reported many times when the real error has nothing to do with handles (and there are no file operations going on) and "this stored procedure expects a parameter that was not supplied" when this is clearly false. So, if a return code might be misinterpreted, it might be a good idea not to convert it to a string.

    And I never noticed before that errors might be misinterpreted as phone numbers. I guess I underestimate the idiocy of some users. Use hex. 800A01AD is much harder to misinterpret as a phone number (The 800 might be suggestive, but 5 digits following it are not)

  55. Random User 43791 says:

    The key take-away I intended was the first paragraph. All it takes is to make sure they understand it is an error code.

    Once user’s brains enter the creativity arena, it matters little what base it’s in, or anything else. They will find a way to make it meaningful to them, even if it seems insane to you.

    Another example (contrived but plausible this time): So you decide error messages will have a text description, followed by the error code in parentheses. A user gets the error message, "Error processing date or time record. (31011974)". They then call you up, asking why the error message has their birthday in it (January 31, 1974), when it should be processing yesterday’s stock quotes.

  56. Neil (SM) says:

    LOL!  It’s like playing "Who’s on First" with the command window:

    C:Usersme>net helpmsg 2

    The system cannot find the file specified.

    C:Usersme>net helpmsg 3

    The system cannot find the path specified.

    C:Usersme>net helpmsg 4

    The system cannot open the file.

    C:Usersme>net helpmsg 5

    Access is denied.

  57. Olivier says:

    "[This is a log file, not an error message to be displayed to the end user. -Raymond]"

    Sure, but it looks like that some "technical" persons (Senior Test Engineer in your post) have to be babysitted like a end user to understand what is the problem.

  58. Alexandre Grigoriev says:

    @Goran

    But in fact, you are just passing a "this" of a CString (look at dissasembly for proof ;-)).

    No. It’s passing a plain (not "deep", as with a copy contructor) copy of CString object, which happens to be pointer-sized, and contain a pointer to its contained string. Thus it becomes an equivalent of LPCTSTR operator.

    Note that C++ standard doesn’t allow passing non-POD types to variadic functions; and MSVC documentation now explicitly says to use an explicit LPCTSTR operator in this case.

    For better results, use CStringA with (f/s)printf.

  59. njkayaker says:

    [Perhaps so, but it didn’t help in Bob’s case… -Raymond]

    Agreed!

    As it turns out, I did a lot of work to improve the error reporting of programs at my company. Even with the better messages, people don’t actually read them!

    At this point, all I ask people to do is forward the actual message to me (which also appears to be beyond people’s capabilities).

    Anyway, the minor extra effort it takes to produce better log output, in my experience, is well worth it. Especially, if one has to support many programs in different environments at any time of day.

  60. John B says:

    /*

    tip:  you can translate error message on the command line with net helpmsg, e.g.

    C:>net helpmsg 2

    The system cannot find the file specified.

    */

    That gives me an error (The system cannot find the file specified.) :D

  61. Goran says:

    @Alexander Grigoriev:

    Ah, yes. My bad.

    @Ooh:

    CString is indeed designed to support a LPCTSTR scenario, e.g.

    void f(LPCTSTR);

    CString s;

    f(s); // CString::operator LPCTSTR() const invoked here.

    But! In fprintf("%s", s), compiler has no idea that it should call said operator.

    The thing works by accident. And Raymond’s complaints on this code are all quite valid – to an unsuspecting eye, that just shouldn’t work the way it’s written.

  62. njkayaker says:

    @Goran  "But! In fprintf("%s", s), compiler has no idea that it should call said operator"

    Functions like fprintf have that sorts of problems with many other types of arguments.

    If you really wanted to make fprintf "robust", you’d cast the arguments to the types indicated in the string. That way, code changes would not break it.

    fprint(stdout, "%d %ld %f", (int)a, (long)b, (double)c);

    (Newer compilers are smart enough to do some checking of the argument types against the format string, but these only work for known/standard functons.

  63. Jonathan says:

    Pros of numeric error codes:

    1. Less code to produce them
    2. Less likely to be misreported by users

    3. For user-facing error messages, survive translation. For example, a screenshot of a messagebox with "המערכת לא מצאה את הקובץ" is quite hard to understand (unless you speak the target language). And you can’t put it into machine translation, since it’s a screenshot, and you don’t know how to type the weird letters.

    4. Easier to document in troubleshooting guides (if you get error 619, do X).

    5. Easier to google for.

    That said, IMO the best would be to provide both. A sufficiently-nice logging system would do that for you. For example, if you use wpp trace (google it), you get to write:

    • TrTRACE("X failed, error code=%!winerr!", GetLastError());

    Which, after formatting, would produce:

    • X failed, error code=2(ERROR_FILE_NOT_FOUND)

    It also does IP addresses, GUIDs, and other stuff.

  64. Goran says:

    @njkayaker (Functions like fprintf have that sorts of problems with many other types of arguments… If you really wanted to make fprintf "robust", you’d cast…)

    Hmmm… "Improving" [X]printf was not my goal. It’s what it is. I was merely commenting about it’s proper use with %s and CString in particular, where one should use an explicit cast.

    There is no reason to e.g. cast an int in when used with %d. As A. Grigoriev said, compiler does a "blind copy" of an int there and that’s it. Incessant casting is not how [X]printf was meant to be used. One should just match format string and argument types, and for the most part that does match and casting is superfluous.

    But with CString the problem is more insidious: it all works, but programmer thinks he is passing a LPCTSTR (he doesn’t, he’s effectively doing a

    memcpy(

     STACK,

     &[CSTRING INSTANCE],

     sizeof(CSTRING INSTANCE))

    and [X]prinf thinks it receives a LPCTSTR (it clearly doesn’t). So all that’s needed to bring the world to a halt ;-) is for CString implementation to change so that blind-copying CString to stack does not put a LPCTSTR (and nothing else) there.

Comments are closed.