Using modular arithmetic to avoid timing overflow problems


In an earlier article, I presented a simple way of avoiding timing overflows which seemed to create a bit of confusion.

The short version: Given a starting time start, an ending time end and an interval interval, the way to check whether the interval has elapsed is to use the expression end - start >= interval. The naive expression end >= start + interval suffers from integer overflow problems.

To simplify the discussion, let's operate in base-100 instead of base-232. The same logic works, but I think operating in base-100 will be easier to follow.

Base-100 means that we remember only the last two digits of any number. Consider a starting time of start = 90 and an interval of interval = 10. Using the wrong expression yields end >= start + interval = 90+10 = 100 = 0. In other words, end >= 0 which is always true since end has the range 0...99. As a result, the wrong expression will think that the interval has expired prematurely.

Using the correct expression, we have end - 90 >= 10. Of the numbers 0..99, the ones that give a difference less than 10 are 90 through 99. Once end = 0, the result is 0 - 90 = 10, which correctly indicates that 10 ticks have elapsed since 90 once the timer reaches 0.

You can work through a similar mistake using start = 89 instead of start = 90; in this case, the wrong expression becomes end >= start + interval = 89 + 10 = 99, or in other words, end >= 99. This has the opposite problem from the previous case, namely that the expression will fail to detect that the interval has expired once the timer rolls over.

But why does the end - start expression work? It's very simple: You just have to remember your rules of arithmetic from elementary school.

(x - c) - (y - c) = x - c - y + c = x - y

In other words, subtracting the same value from both terms of a difference does not affect the final value. This rule applies even to modular arithmetic (because, as the mathematicians like to say, the set of integers modulo n form an additive group).

This rule is useful because it lets you delay the overflow as long as possible by subtracting the starting point from all your time markers; it has no effect upon time intervals. Wouldn't it be great if start = 0? Then the overflow won't happen for 100 ticks. Well, you can act "as if" the starting point were start = 0 by simply subtracting start from all your time markers.

Those who prefer a graphical view can think of time passing as the hands around a clock (which wraps around at 60 minutes, say). When you decide to record your start point, rotate the clock so that the "12" precisely lines up with wherever the hand happens to be. You can now read off the elapsed time directly from your rotated clock. Rotating your clock is the same as subtracting (or adding) a constant to all time markers.

Of course, this trick falls apart once you have to measure time intervals that come close to the wraparound time of your timer. In our 100-tick timer, for example, trying to measure the passage of 90 ticks is very difficult because there is only a 10-tick window where the inequality is satisfied. If we fail to catch the timer during that window, we miss it and have to wait another 90 ticks.

So don't do that. In practical terms, this means that you shouldn't use GetTickCount to measure time intervals longer than 15 days.

Comments (21)
  1. Tim Smith says:

    I’ve seen plenty of published code that has this bug in it. What is even more scary is the number of people who seem to care when I point the problem out. Often they say that "my program doesn’t need to run for 49 days" while forgetting that it isn’t about when the program starts but when the last time the tick counter rolled over.

  2. duggie says:

    That’s mod 100, not base-100.

  3. Tim,

    GetTickCount() is lined up with the start-up time of the current process. So there’s no risk if the program doesn’t run 49 days in a row.

    Of course, this is not a good reason to leave bugs, especially when getting things right is not more expensive than getting them wrong.

  4. Dave says:

    Since GetTickCount is pretty grainy, I use QueryPerformanceCounter. I just wish there was an elegant workaround to its hardware-related bugs.

    http://support.microsoft.com/default.aspx?scid=kb;en-us;274323

  5. b100dian says:

    Given that you know that the maximum ‘end’ is X (60minutes, in the clock example), one should also do

    (end – start)%X >= interval

    This way, if ‘end’ got re-set to zero or less than ‘start’, you still have a correct elapsed time.

    BTW: this ‘negative number modulo’ thing does not work as expected in Windows Calc.exe or Excel or anything. Any idea why?

  6. Works as expected for me. -1 mod 2 = -1. See section 6.5.5 of the C standard. Perhaps it is your expectations that are wrong.

  7. asdf says:

    I’m pretty sure Raymond means unsigned modular arithmetic here. To simulate unsigned integers to what the C standard requires in calc.exe: lets say you do (a-c) % 60, if (a-c) is negative, keep adding 61 to it until it’s positive before you do mod. It’s a good idea not to give the modulus operator a negative integer in general because it’s undefined or defined differently depending on the context and whatever invariant whoever defined it wants to support.

  8. er, asdf, I’m pretty sure you meant "add 60" rather than "add 61" ;)

    As far as GetTickCount() goes, it doesn’t matter if the ticker has wrapped since system startup. As long as the following three things are true, you’ll be OK:

    1) Follow the formula: end – start >= threshold

    2) Don’t try to use a threshold that is too large (15 days is too large)

    3) Check frequently (once every 15 days isn’t frequently enough)

  9. CornedBee says:

    According to MSDN, GetTickCount starts at *system* launch, so your process doesn’t matter – the total system running time does. And modern Windows versions can actually run for 49 days without crashing.

    > The GetTickCount function retrieves the number of

    > milliseconds that have elapsed since the system was

    > started."

  10. josh says:

    Modulus with negative numbers depends on how you do your rounding for integer division. a % b == a – b * (a / b)

    Calc (and C) rounds towards 0, so -x % +y is negative. Rounding towards -infinity would make it positive.

  11. Jonathan says:

    "And modern Windows versions can actually run for 49 days without crashing."

    … and getting an update that requires a reboot?

    -Jonathan

  12. b100dian says:

    AFAIK, ‘a mod b’ it’s an operation that

    – requires b to be positive

    – gives in return the reminder of a/b, which is a number between 0 and b-1

    so:

    -2 mod 7 = 5 instead of -2

    http://www.google.ro/search?q=-2+mod+7

  13. Tim Smith says:

    Serge,

    That isn’t true at all. If it was, why does my test program that I just ran show a tick count of 1139467562? That would be 13 days…

  14. Daveh says:

    I believe this is a difference between the old clock() which starts at 0 each time the process is run and GetTickCount() which begins at 0 when Windows is started.

  15. Veritas says:

    Its shocking that in 2005 we still have to worry about things like limits.

    Any programming language which doesn’t automatically allocate enough space for a variable to do whatever it needs to do, deserves to be deprecated.

    This would make programming much easier.

  16. "Any programming language which doesn’t automatically allocate enough space for a variable to do whatever it needs to do, deserves to be deprecated."

    All numeric types should be arbitrary-precision by default? The only language I can think of that does this is bc.

  17. josh says:

    lisp and haskell (and probably their closer relatives) are arbitrary precision by default, and I’d imagine some of the newer scriptlike languages are as well. This often makes it impossible to take full advantage of the machines speed, since any time you want to use size-limited numbers for performance, there’s an extra flag indicating it that you need to maintain.

    Seeking to make programming easier at any cost is a shortsighted effort.

  18. cola says:

    Yup, if you want easier coding you can write in a scripting language. Unfortunately, this is rarely a palatable option. Maybe it should become more acceptable to code professional applications in Lisp or Python or Visual Basic? Or maybe the languages themselves should be improved and accelerated?

    I don’t think Word N could be written in Visual Basic with C++ extensions for the heavy processing and the visual widgets. It might be technically possible, but imagine the damage to Microsoft’s reputation!

  19. asdf says:

    Actually that makes programming much harder (for those of us that like robust code at least): basic operations aren’t O(1) anymore which makes algorithms harder to reason about, all arithmetic operations aren’t atomic (maybe, it depends on the language), any arithmetic operation can run out of memory (presumably you would throw an exception to handle this as opposed to returning an incorrect or dummy value, either way, you would have to detect and handle that somehow).

  20. Chris Walker says:

    Some of these arithmetic problems would essentially go away if there was a 64 bit version of GetTickCount. Something like GetTickCount64 would do nicely. There would still be rollover problems such has timing windows that cause very long waits.

  21. Putting together pieces you already know.

Comments are closed.