Spot the Defect!


/>

At
Microsoft we have an internal email list called “Spot the Defect” — people mail around
buggy code they’ve discovered and we compete to see who can find the most problems
with it.  It’s fun, and you learn a lot
about what other people consider bugs — everything from security holes to lying comments!

 

I
love playing Spot the Defect. Here is the code for the
WScript.Sleep method
with the comments removed and some serious
bugs added
.  You’ll note that this
code has all the required features I mentioned in my previous post.  We
go to sleep in one-second (or less) intervals, and tell the operating system to wake
us up if COM posts a message to the message queue, because there might be an event
handler to dispatch.  We also check to
see if the host recorded a script error (either due to an event handler or due to
the script timeout firing) so that we can abort the sleep.  This
way we never keep the script alive more than a second after it was shut down due to
error.

 

What
bug did I add?

 

HRESULT
CWScript::Sleep(long Time)

{

    const
DWORD TimerGranularity = 1000;

    if
(Time < 0)

        return
E_INVALIDARG;

    DWORD
StartTickCount = ::GetTickCount();

    DWORD
EndTickCount = StartTickCount + Time;

    DWORD
CurTickCount = StartTickCount;

    while(CurTickCount
< EndTickCount)

    {

        MSG
msg;

        DWORD
CurWaitTime = (DWORD)(EndTickCount – CurTickCount);

        if
(CurWaitTime > TimerGranularity)

            CurWaitTime
= TimerGranularity;

        ::MsgWaitForMultipleObjects(0,
NULL, TRUE, CurWaitTime, QS_ALLINPUT | QS_ALLPOSTMESSAGE);

        if
(0 != ::PeekMessage(&msg, NULL, 0, 0, PM_REMOVE))

            ::DispatchMessage(&msg);

        if
(m_pHost->FErrorPending())

            return
S_OK;

        CurTickCount
= ::GetTickCount();

    }

    return
S_OK;

}

 

Comments (17)

  1. Mike Dunn says:

    I see two things that look odd:
    1. Is it legal to call MsgWaitForMultipleObjects with zero handles?
    2. After MsgWaitForMultipleObjects returns, you should pump all waiting messages, not just one.

  2. Eric Lippert says:

    1) Yes, that’s legal.
    2) Good catch — that bug actually exists in the sources right now, I didn’t add it! I’ll send that one to the sustaining engineering team.

    However, there is a more serious bug that I’ve introduced deliberately.

  3. Kim Gräsman says:

    1) DWORD EndTickCount = StartTickCount + Time;
    This feels like it’s overflow-prone.
    2) If MsgWaitForMultipleObjects does indeed accept 0, NULL for the handles arguments, how does it handle bWaitAll == TRUE for such a case?

    I don’t have a compiler/debugger handy 🙂

  4. Eric Lippert says:

    1) You’re on the right track. What specifically can go wrong here?

    2) I must confess that I don’t know. Try it and see!

  5. Mike Dunn says:

    *grin* Eric, can I get a T-shirt for finding that bug 😉

    The overflow bug would appear in a situation such as
    1. The machine has been on for a long time and GetTickCount() is nearing 0xFFFFFFFF (which happens every ~49.7 days)
    2. The caller passes in a Time value which, when added to the current tick count, overflows a DWORD
    3. CurTickCount will then be greater than EndTickCount
    4. The while loop exits immediately, so the code never sleeps

  6. Ed Ball says:

    Worse, though, would be if both CurTickCount and EndTickCount are just before 2^32. By the time you set CurTickCount to ::GetTickCount again, the current tick count may have overflowed and ended up a very small number, which would cause the function to sleep almost 50 days!

  7. Eric Lippert says:

    Sorry, no T-Shirt, but good effort. That’s one possible overflow bug. There is a FAR worse one though. Your overflow bug causes it to never wait. There’s also one that causes it to wait forever.

  8. Bradley says:

    The worst case possibility for the above scenario is that EndTickCount could be calculated as 0xFFFFFFFF while CurTickCount could be set to 0 at the bottom of the loop. The conditional is still true, so the while loop executes again, but EndTickCount – CurTickCount is now 0xFFFFFFFF, a.k.a. INFINITE, so MsgWaitForMultipleObjects might wait forever.

  9. Eric Lippert says:

    Ed and I apparently posted at the same time. You got it Ed.

    Actually, it will sleep AT LEAST 50 days, but is likely to sleep longer. The "window" in which curTickCount > endTickCount can be very small and if you miss it, another one doesn’t come around for 50 days.

    In the actual code we keep track of everything in doubles, not DWORDS, detect clock overflows, and adjust the current tick count variable accordingly.

  10. Mike Dunn says:

    Or if EndTickCount happened to be 0xFFFFFFFF, that would wait (almost) forever. The loop would end only if CurTickCount were, by chance, re-set when GetTickCount was 0xFFFFFFFF.

  11. Eric Lippert says:

    Bradley, the adjustment down to the one second granularity prevents INFINITE from ever being passed to WFMO, but in practice this will cause an infinite wait as the odds of hitting CurTickCount = 0xffffffff exactly at the bottom of the loop are around 1/1000, and you get one chance every 50 days. So we can expect that in this (unlikely but possible) scenario, we’d wait on average 25000 days!

  12. Ed Ball says:

    We just use the subtraction trick to avoid overflows, i.e., change (CurTickCount < EndTickCount) to (CurTickCount – StartTickCount < Time), as documented in MSDN, but I suppose if you want to sleep for 50+ days, you’d need to detect clock overflows.

  13. Bradley says:

    Yes, of course. Somehow I missed the use of CurWaitTime; that’ll teach me to speed-read the code in order to get 1ST P05T!!1!!11. 🙂

  14. Centaur says:

    2Kim Gräsman:
    Quoting the MSDN, April 2003, MsgWaitForMultipleObjects:
    ===
    bWaitAll
    [in] If this parameter is TRUE, the function returns when the states of all objects in the pHandles array have been set to signaled and an input event has been received. If this parameter is FALSE, the function returns when the state of any one of the objects is set to signaled or an input event has been received. In this case, the return value indicates the object whose state caused the function to return.
    ===

    Since the argument is TRUE, the first rule applies. Since the set of said objects is empty, every predicate about its elements is true, including "X has been set to signaled".

    So one might expect that the function should return immediately. However, that’s wrong, because the rule also says "and an input event has been received". Thus, although the first subcondition "(forall h in pHandles) (h is set to signaled)" is always true, the second "an input event has been received" is false until a message comes. Therefore, this part of code is correct.

    Another question might be: Can you replace the bWaitAll value with FALSE, preserving the semantics?

    The answer is yes. The first subcondition would change to "(exists X in pHandles) (X is set to signaled)". Since pHandles is an empty set, the subcondition is always false. However, with bWaitAll the boolean operation between these subconditions changes to OR, and the function will still return when an input event is received.

    Not depending on the value of bWaitAll, the function will also return if the timeout elapses.

  15. Kim Gräsman says:

    Centaur,
    Yes, it makes sense, if you regard the 0, NULL pair as an empty set.
    Admittedly, that would probably be the only way to see it if 0, NULL is indeed valid input, and this is not mentioned in the MSDN article.

    Your other question is what threw me, as well. If the handle set is empty, the second parameter shouldn’t really matter, which isn’t documented either. So, looking for a bug, I thought I’d raise it 🙂

    Thanks for the input.

  16. Mike Dimmick says:

    Darn, I’ve probably got some of those never-waits/waits-forever bugs in a product. We have a library for adapting DOS applications written for a Symbol Series 3000 hand-held terminal to Windows CE devices without the Console component (e.g. Pocket PCs). Apart from some small areas relating to barcode scanning, everything runs on the same thread – the user’s application and the GUI. To keep it responsive, we use ::MsgWaitForMultipleObjects whenever the user’s code performs a sleep operation.

    I can tell you from experience that the Windows CE version of MsgWaitForMultipleObjects _does_ require at least one wait handle. So I wait on the result of ::GetCurrentThread(), which should never be signalled (that would of course indicate that the thread had completed…)

  17. The formula (begin <= current && current < end) would eliminate the infinite loop problem.