A Simple Puzzle

My original version of the histogram-generating code that I whipped up for the previous episode of FAIC contained a subtle bug. Can you spot it without going back and reading the corrected code?

private static int[] CreateHistogram(IEnumerable<double> data, int buckets, double min, double max)
  int[] results = new int[buckets];
  double multiplier = buckets / (max – min);
  foreach (double datum in data)
    int index = (int) ((datum – min) * multiplier);
    if (0 <= index && index < buckets)
      results[index] += 1;
  return results;

Note that of course if this were production code, instead of demo code I whipped up in five minutes, it would be a lot more robust in its error detection; the bug that I am looking for is a bona fide error in the logic of the method, rather than things like “the method does not verify that min is smaller than max”, and so on.

A hint: the first time I ran this code and displayed the results, the generated histogram looked fine. Then I made a small change to the arguments and the resulting histogram image was obviously wrong. Can you spot the defect?

Comments (24)

  1. Craig says:

    It seems a bit odd that you are casting an int to an int inside the array index. It looks like you may have mistakenly included a hint.

    You are correct; that was an editing error. I've removed it. Thanks! — Eric

  2. Picarret says:

       var value = (Int32)4294967296.0;
       Console.WriteLine(value == 0);
       // And so on…

    I'm not following you. How does this relate to the bug in my code? — Eric


  3. Lars Kemmann says:

    CreateHistogram discards all data outside the [min, max) range – specifically, any datum equal to max is discarded.  That may be intentional?

    That is intentional; the intention is to display the portion of the histogram between the ranges. However, you're on the right track. Does it in fact discard all data outside the desired range? — Eric

  4. Ali says:

    Ah tricky. I admit that it took me a few more minutes to see the issue even after I looked up the correct code.

    (Spoiler: I knew it would somehow be about negative numbers but the code was correct even when min is negative, at that point I lost hope and cheated)

  5. Lars Kemmann says:

    Hmph.  And I thought I knew C#. :)

    Eric, please explain why that doesn't behave the way I expected it to.

  6. KevDog says:

    Is it related to the relationship of buckets to max-min? If there were only one bucket (which is permitted even if silly) it looks like you could get an index that matches no bucket.

    I need to read these kind of puzzles on something other than my iPhone.

    The issue I have in mind will come up if there is only one bucket, yes, but the issue I have in mind affects histograms with any number of buckets.  — Eric

  7. Picarret says:

    [Addition to my first comment]

    Double is a 64bit type and Int32 is 32bit. The cast will erase a part of the index before we ensure that the initial operation is in range.

    Indeed, this is a problem; good catch. However, in this particular case all the data are reasonably sized and the program still has a bug. — Eric

  8. cthrek says:

    Agreeing with Lars… any datum equal to the maximum incorrectly gets discarded.

    And what's the problem with that? It's a histogram of tens of thousands of random data; the odds of one of them being exactly the maximum are tiny, and if it is, who cares if one datum in a hundred thousand is accidentally not displayed? We're already not displaying any data outside of the given range. — Eric

  9. mafutrct says:

    Isn't this skewed by 0.5 in the cast to int?

    You tell me. Is it, or isn't it? — Eric

  10. thomas.tharp@mcquay.com says:

    I'm stuck on the cast to int… it looks like you should be performing some actual rounding logic there to make sure index is the appropriate value when dealing with a min and max that are negative.

    You're getting close. — Eric

  11. jhominal says:

    Data that would be in the first bucket before the histogram will be included in the first bucket of the histogram. (Because "(int)x" will evaluate to 0 for x in ]-1,1[)

    You got it! — Eric

  12. Dan says:

    Short and sweet:

    any time (datum-min)*multiplier gives you a result between -1 and 0, not inclusive, the cast to int will always result in a zero.

    Thus not all data is excluded, the bucket at index 0 gets one buckets worth of extra data, however much that may be.  If you do the cast after checking the range, you will exclude those values.

    You got it! Beaten by mere seconds by jhominal. — Eric

  13. spencer says:

    Not sure about the C# semantics on the (int) cast from double. My first assumption was that it would truncate, so that eventually, with large enough (or small enough) values of datum, index would wrap around back into the allowed range.

    Also, I'm agreeing with the others that if datum == max, it's discarded.

    When I tested casting with gcc, I got this behavior

    (int)4294967297 -> -2147483648

    In fact, casting any large double to int gave me -2147483648.

    If I cast it to a long, then cast the long to an int, I got 1, as I would expect with truncation.

  14. Ali says:

    Eric, If I were to write this code, I would automatically write it as:

    foreach (var datum in data)


       if(datum>=min && datum<max)


           bla bla



    and there can be no bug even if I'm not careful. Is there any particular reason you did not choose to write that way?

  15. bjorg says:

    The only time it goes into the last bucket is when the value is equal to max.  Did you mean to use a rounding function when going from double to int instead?

    I'm not following your train of thought here. Suppose min is 0, max is 5 and buckets is 10. Multiplier is 2. If one of the data is, say, 4.7 then we multiply by 2 to get 9.4 and then round off to 9, which is the index of the last bucket. But 4.7 is not equal to 5. So I've provided a counterexample to your claim; would you care to clarify the claim? — Eric

  16. Joshua Starner says:

    I just watched a presentation by Bret Victor last night called Inventing on Principle, and he talks about an interface that gives algorithm developers immediate insight into the results of their functions.  If you haven't seen it, you should watch this section of it: http://www.youtube.com/watch

    Watching the video and then reading this post made me think: Man, if Eric had this tool, he may not have had that bug. :)

  17. clockworksaint@gmail.com says:

    @Ali I don't think that has anything to do with the bug, does it? Mathematicians would normally write something like (min<=value<max) to more clearly express how value falls inside a range with a lower and an upper bound. Indeed some languages such as Python actually allow that syntax. In languages like C# that don't allow that syntax, (min<=datum && datum<max) is the closest you can get.

  18. clockworksaint@gmail.com says:

    This all sounds very familiar. This bears some relation to one of Eric's previous posts, "What's the difference? Remainder vs modulus":


    In fact, I think in the comments I linked to an instance of a very similar bug.

  19. Ali says:

    @weeble0 I think you misunderstand me, my comment was "first check if the current point is in range, and if it is, only then calculate the index" seems a more natural way to code the algorithm. I wondered if there is some disadvantage I cannot see. (Except that, then there would be no interesting bug/puzzle to talk about)

  20. clockworksaint@gmail.com says:

    @Ali Ah, sorry, I did misunderstand you.

  21. Jagannath says:

    private static int[] CreateHistogram(IEnumerable<double> data, int buckets, double min, double max)


     int[] results = new int[buckets];

     double multiplier = buckets / (max – min);

     foreach (double datum in data)


       int index = (int) ((datum – min) * multiplier);

       if (0 <= index && index < buckets)

         results[index] += 1;

       min = datum;


     return results;


  22. Olivier says:

    The moral of all this: static typing does not replace testing ;-)

  23. CodeInChaos says:


    The disadvantage of that method is, that you now need to be *very* careful about rounding errors. I would not be surprised if there are values min<=x<max for which the calculated bucket is outside of the array range. Floating point rounding is a tricky beast.

    So I prefer Eric's code (with a fix for his bug of course).

    Since IMO the build in Random class in .net sucks, I'm currently creating my own RNG. A similar issue crept up:

    One feature is generating a uniform double that satisfies min<=x<max from a uniform double that satisfies 0<=x<1.

    The naive implementation if this is simply: `return min+(max-min)*Uniform()`. Unfortunately if `Uniform()` returns `1-Epsilon`, the result might get rounded to `max`, violating the contract.

    So I've added a do-while loop ensured the result is smaller than max.

  24. pr3fixcoder@hotmail.com says:

    (max – min) / buckets ?