It’s one thing to say "somebody should do this", but doing it is another matter


A common response when I describe a programming error is, ""Programs should have to pass a test that includes testing for this case." The case could be a program mishandling a message, a program responding incorrectly to IUnknown::QueryInterface, whatever. But these suggestions fall into the same trap that I see when grading student essays: They're good with the what but very weak on the how.

Saying "Somebody should do X" is easy, but without some sort of suggestion as to how that could be accomplished, the suggestion rarely gets off the drawing board. It's like saying "Somebody should solve world hunger."

Let's look at that first example again. The topic at hand was window procedures which fail to pass unhandled messages to the DefWindowProc function. How would one "test for this"? Would the test walk through every code path in the program that creates a window, and then send each of those windows a fake WM_QUERYENDSESSION or a fake WM_APPCOMMAND message to see what happens? First of all, it's unclear how a test could exercise all the window-creation code paths of a program without insider knowledge of that program. Therefore, this test would have to be written by the authors of the program.

Next, even if you sent the message and saw that the message was passed to the DefWindowProc function, that wouldn't actually prove that the message was handled properly. Maybe the window procedure for a window goes something like this:

...
case WM_QUERYENDSESSION:
 if (GetTickCount() / 1000 % 2) return 0;
 return DefWindowProc(hwnd, uMsg, wParam, lParam);

Even if you managed to get this window created, if you send it a fake WM_QUERYENDSESSION, you'll catch the issue only half the time. It's not enough just to exercise every window procedure; you also have to exercise every code path.

But wait, there's more. What if the program really wanted to prevent the user from logging off?

...
case WM_QUERYENDSESSION:
 if (FileHasBeenEditedSinceLastSave()) {
  switch (MessageBox(hwnd,
             TEXT("Save changes before exiting?"),
             TEXT("Title"), MB_YESNOCANCEL)) {
  case IDYES: Save(); break;
  case IDCANCEL: return 0; // user cancelled logoff
  }
 }
 return DefWindowProc(hwnd, uMsg, wParam, lParam);

In this case, there is a code path that cancels the logoff, and it is legitimate, since it was done as the result of a user decision. Your test would somehow have to know this and consider that case to be a pass and not a failure. This sort of reasoning is hardly something that a generic test suite can do; it has to be tailored for each program.

It's one thing to say that something should be tested, but without an idea as to how it should be tested, the suggestion is much less valuable. I may as well say, "Programs should have to pass a test that verifies that there are no bugs."

Comments (26)
  1. Thom says:

    Programmers should have to pass a test to verify that they will not annoy Raymond.  Only then should they be allowed to write code which makes it into software that has the potential to annoy me.

  2. richard says:

    A possible corollary to this is the way code gets modified when an unexpected bug does arise. I find the tendency is to add tests, handling and diagnostics for that specific problem (usually the result of a subtle coding fault). Despite it having been an unexpected problem (granted, we could debate if the design was sufficiently complete if it failed to consider this error condition).

    I find this often adds complexity to the code, without adding benefit.

  3. Medinoc says:

    I think that rather than sending a fake WM_QUERYENDSESSION, testing the program calls DefWindowProc() could be more reliable by sending a message in the "registered" range.

    Registered using a MAC-based GUID for good measure.

  4. Mikkin says:

    Testing never proves the absence of bugs, only their presence.

  5. JenK says:

    <i>"Testing never proves the absence of bugs, only their presence."</i>

    Not to mention http://en.wikipedia.org/wiki/Halting_problem

  6. Joe Butler says:

    Ray Trent: "Hate to beat a dead horse, but once again the problem is insufficient foresight on the part of the designers of Windows."

    The point was that a message had been mishandled. Your assertion that it would all be rosy if window procedures didn’t work the way they do doesn’t really follow on from that.  People would still be able to mishandle messages.  Take the 50% of programmers on the the left hand side of the normal distribution and they will undoubtebly demonstrate the consumate skills required to mess the simplest of things up.

  7. m0ff says:

    Yeah, I can’t agree more with that comment on ‘Halting Problem’. I tend to observe that writing a piece code can be quite challenging, but then writting a test for it is much more difficult. It involves some scripting, frameworks and all that jazz. This repeats on and on, no matter Windows, Linux, VxWorks or whatever.

  8. Aaron says:

    Ray Trent: Technically speaking, isn’t just about every problem imaginable caused by -somebody’s- insufficient foresight?  That is why we have the expression, "the benefit of hindsight".

    By Raymond’s example, and using your logic, we wouldn’t have world hunger if people in those starving countries had had the foresight to stop making babies and start growing crops.  It’s a ridiculous and downright pompous assertion.

    Not to mention, your solution doesn’t actually solve the non-trivial problem, but I’m sure you foresaw that too…

  9. JCM says:

    On a related note, I’ve been working in the world of automated testing for the past couple of years. I very often have to tell colleagues around here that, "Everything is easy in English."

    They usually look at me like I was speaking a foreign language.

    Although I wrote my first program back in 1982 when I was in the sixth grade, I am by education an electrical engineer. Most of my colleagues don’t have the background that I do, and can’t understand how much work there is behind their "just have it do x" requests.

  10. toomuchwin32 says:

    Apart from agreeing vehemently with Aaron, Nicholas, Joe Butler and the others that Ray Trent’s "brilliant" solution does not actually solve the problem at hand, I must point out that it, in fact, creates a new problem. There are cases where the programmer might want to call DefWindowProc for a message "first" and then do some processing later. As a random example, I might want pass the WM_SETCURSOR message to DefWindowProc, let it do it’s thing (i.e. set the cursor based on the class cursor etc) and then do some post-processing. If Windows followed what Ray Trent suggests we would need a "Pre" and a "Post" message for every windows message.

    Criticising someone else’s lack of foresight – aah such an easy thing to do.

  11. A Bad Coder says:

    You test for conditions that are unacceptable.  Something may be impossible to test because of the halting problem, but that does not mean there is no solution.  The solution is to remove the unacceptable conditions.

    In this case, each program implements their own GUI even if by calling a default implementation and if they fail to do so correctly the GUI fails — windows don’t move, or don’t repaint, deadlock, or whatever.

    Immovable windows simply do not happen in X11 or OS X, so there is no need to test each program for correctness.  These do not occur because the design precludes such unacceptable conditions from happening.  When you are faced with a program that can not be tested, the problem is not one of testing but of design.  The problem is Windows.

  12. harmony7 says:

    People picking on Windows for not having an immune window message handling system don’t realize that this is the aspect of Windows that really makes it powerful.

    An application gets to look at a window message sent to one of its windows and decide to process it itself, modify its parameters and pass it to the default window procedure, generate your own window messages in response (such as seen by DefWindowProc as well, such as by creating WM_CHAR messages in response to WM_KEY*), or even let the window outright ignore it completely–I’m sure there are legitimate cases for all of these cases, and you can even do any of these in combination.  Of course, it needs to be implemented correctly, but this is a computer, and it’s just a machine that mindlessly performs a list of instructions you provide to it.

    The halting problem is mentioned above.  What if a program has an inadvertent while(true); in the window procedure somewhere?  How would a standardized test differentiate between that and a procedure that is just taking a very long time to complete?  (Test using a timeout?  But what if you set a timeout to 30 seconds and the window message ends up taking 31 seconds to complete, etc.)

    I really don’t see what this argument is about.  When I buy any product, I have to hope that it has been built correctly, software or not.

    What are you going to ask for next, idiot-proof cars that know not to drive on the wrong side of the road?  And then one day a pedestrian jumps out of nowhere, and the car will prevent you from swerving to an empty lane to prevent the accident.  Not going to happen.  The driver is trusted to do the right thing.

    Let’s also hope that your house is build to withstand earthquakes.  How is this any different than hoping that your program doesn’t end in an endless loop?

    The problem is not about Windows — it’s not the OS’s responsibility to prevent a thread from becoming unresponsive when it encounters a while(true); .

    Raymond is using WndProc/DefWindowProc as only an example of how some people say that certain things should be done without having any idea of how it should be done.

  13. Miral says:

    As Medinoc suggested, testing that in the general case a window procedure calls DefWindowProc for unknown messages is easy — just obtain a message id that the program is guaranteed to not handle (eg. a registered message), hook DefWindowProc so you know whether it gets called, and then pass it the message.

    This at least is a reasonable test, even if it doesn’t test that the app processes those messages it does want to process "correctly" — since that is essentially impossible to test, as Raymond states.

  14. Dean Harding says:

    Miral: but to what end? Whether the application passes that particular test or not doesn’t actually tell you anything useful.

  15. Ray Trent says:

    Hate to beat a dead horse, but once again the problem is insufficient foresight on the part of the designers of Windows.

    WindowProcs should *return* whether they handled a message or not, and the *system* should call DefWindowProc if they don’t. This whole issue then would devolve to "the application lied to Windows, so it gets what it deserves".

    If a message handler needs to return a value, it should do so by reference, *like any other sane event handler*. That allows for more flexibility in return types, but more importantly, the regular return value really wants to be an error code. Microsoft figured this out by the time COM came around, but it was too late.

    (note: People who might be inclined to put in a vote for using exception handling rather than an error code for this kind of thing should consider what the word "exception" means… but I generally think exceptions are a bad idea… if you *really* want to have transparent error handling (ugh), aspects are a more general and cleaner solution)

  16. Nicholas says:

    Ray Trent writes: "WindowProcs should *return* whether they handled a message or not,"

    That doesn’t solve anything.  It is only a lateral move.  The way it currently is and the way you suggest are both of the form "Windows is relying on the program to do something."  In your case you want Windows to rely on the return value of the window procedure.  But, the return value may be incorrect.  How is an application that may return the wrong return value different from an application that may forget to call DefWindowProc?

  17. Dean Harding says:

    A Bad Coder: So how do you propose we go and change Windows 3.x then?

    In fact, the specific problem you mention has already been fixed in Vista with the DWM (and to a lesser extent in XP where the window manager will take over moving, maximizing and minimizing windows if it detects the window procedure is not responding) so you’re complaining about a problem that was fixed maybe 5 or 6 years ago. Some of us have learnt to move on…

    But the *general* problem is not specific to Windows. The general problem being described here is one of programmers making mistakes (and expecting the framework to detect and correct the mistakes). I’d love to hear how OSX and X11 have solved *that* particular problem.

  18. Triangle says:

    "But the *general* problem is not specific to Windows. The general problem being described here is one of programmers making mistakes (and expecting the framework to detect and correct the mistakes). I’d love to hear how OSX and X11 have solved *that* particular problem."

    That depends on what your definition of Windows is. If you mean the win32 API, then I have to disagree with you. The win32 api is EXTREMELY subtle, the documention is sometimes lacking and/or inaccurate, and many things which Should Just Work work pretty poorly. I personally chalk it up to the fact Windows is an extension of an operating system designed to run in 320 KB of RAM, but say what you will. One thing that I’m sure of: It is the reason compatibility problems and workarounds plague Windows so much.

  19. Dave Harris says:

    Dean Martin: whether or not the application passes the test does tell you something interesting.

    Specifically, if it doesn’t pass, then the application has a serious bug and should be refused certification (or whatever the consequences of failing are). The test won’t find all bugs, but nobody sane expects it to. If the test filters out a few broken programmes, it will be useful.

  20. dave says:

    harmony7: "People picking on Windows for not having an immune window message handling system don’t realize that this is the aspect of Windows that really makes it powerful."

    Butbutbut, You Don’t Get It.  People want a system that will allow applications to do everything they should do and forbid them from doing anything they shouldn’t do.

    I mean, all we have to do is get everybody to agree on what goes into which category and then solve he halting problem, how hard can it be?

  21. Anon says:

    I read that in Kernel Mode Windows 95 would send a message to VxDs that they couldn’t possibly handle and checked that they said they couldn’t handle it.

    IIRC in the checked build there was a message. But it could have refused to load them. Then again enforcement probably isn’t an option for Windows user mode applications since there is probably a commercially important app that break every possible rule.

    In kernel mode if you enforce the rules from the start you can simplify things. In fact the NT kernel will bug check if a driver breaks a rule.

  22. Mikkin says:

    > Something may be impossible to test because of the halting problem, but that does not mean there is no solution.  The solution is to remove the unacceptable conditions.

    So the great leap forward is to remove support for Turing-equivalent computation! Why didn’t I think of that?

  23. Jim says:

    I like Raymond’s idea a lot, but I think he addresses the idea to wrong audience. Many users do not realize the "how" part when they ask for functionality, and when you deliver the things they want they immediately change their mind and try to blame on you!!

  24. A Bad Coder says:

    “A Bad Coder: So how do you propose we go and change Windows 3.x then?”

    One thing you can do is have a known-good library on top of it so applications prefer to use a different design (kind of like how .net solves some of these problems).

    “In fact, the specific problem you mention has already been fixed … to a lesser extent in XP where the window manager will take over moving, maximizing and minimizing windows if it detects the window procedure is not responding”

    So in other words XP *tests* the program for correctness?  Which is exactly what this thread and theory says is not possible to do?

    “DWM … Some of us have learnt to move on…”

    /sigh

    That is exactly my point, change the design don’t try to test things which are untestable. EOT

    [The window manager detects only the “program takes too long to give an answer” case. It doesn’t detect the “program immediately returns the wrong answer” case. The problem with WM_DEVICEBROADCAST wasn’t that programs were not responding to the message; it’s that they were responding to the message incorrectly. How do you know that when a program says “No, I’m still using that device”, it is telling the truth? -Raymond]
  25. Triangle says:

    Friday, January 25, 2008 10:10 AM by dave

    "I mean, all we have to do is get everybody to agree on what goes into which category and then solve he halting problem, how hard can it be?"

    This is a misunderstanding alot of commenters seem to believe, so let me clarify: The halting problem *in general* is not solvable. There are specific cases that can be solved through specific means. It’s not _the_ halting problem that needs to be solved, it’s _a_ halting problem.

  26. Ian Boyd says:

    i’d be quite happy if a test tested whatever it could.

    i’d be quite happy with a test harness added to AppVerifier that injects fake messages during PeekMessage (a.l.a "Sure, we do that" http://blogs.msdn.com/oldnewthing/archive/2004/02/11/71307.aspx), and log if the message didn’t pop out the bottom through DefWindowProc.

    If the developer has no idea that they are even supposed to call DefWindowProc, they’ll be informed pretty quickly through their own testing with AppVerifier. And the more screens they test, the more problems they’ll find.

    And it’s better than nothing.

Comments are closed.