Spot the defect: Bad comparisons, part two


Suppose I want to sort a bunch of strings into order first by length, and then, once they are sorted by length, sort each group that is the same length by some other comparison. We can easily build such a device with higher-order programming:

static Comparison<string> FirstByLength(Comparison<string> thenBy)
{
  return (string x, string y) =>
  {
    // Null strings are sorted before zero-length strings; remember, we need to provide a total ordering.
    if (x == null && y == null)
      return 0;
    if (x == null)
      return -1;
    if (y == null)
      return 1;
    if (x.Length > y.Length)
      return 1;
    if (x.Length < y.Length)
      return -1;
    // They are the same length; sort on some other criterion.
    return thenBy(x, y);
  };
}

Super. This idea of composing new comparison functions out of old ones is pretty neat. We can even built a reversal device:

static Comparison<string> Reverse(Comparison<string> comparison)
{
  return (string x, string y) => -comparison(x, y);
}

Something is subtly wrong in at least one of these comparison functions. Where’s the defect?

.

.

.

.

.

.

.

.

Let’s restate that contract again. The comparison function returns a negative integer if the first argument is smaller than the second, a positive integer if the first is greater than the second, and zero if they are equal. Any negative integer will do, and in particular, Int32.MinValue is a negative integer. Suppose we have a bizarre comparison function that returns Int32.MinValue instead of -1:

Comparison<string> bizarre = whatever;

and we compose it:

Comparison<string> reverseFirstByLength = Reverse(FirstByLength(bizarre));

Suppose two strings are equal in length and bizarre returns Int32.MinValue for those strings. Reverse should return a positive number, but -Int32.MinValue either throws an exception (in a checked context) or returns Int32.MinValue right back (in an unchecked context). Remember, there are more negative numbers that fit into an integer than positive numbers, by one.

The right implementation of Reverse is either to spell it out:

static Comparison<string> Reverse(Comparison<string> comparison)
{
  return (string x, string y) =>
  {
    int result = comparison(x, y);
    if (result > 0) return -1;
    if (result < 0) return 1;
    return 0;
  };
}

Or, as some commenters noted, to swap left and right:

static Comparison<string> Reverse(Comparison<string> comparison)
{
  return (string x, string y) => comparison(y, x);
}

Next time: One more defect to spot.


Comments (21)

  1. Brian says:

    Couldn't you implement reverse as "return (string x, string y) => comparison(y,x);" instead?

  2. Patrick Huizinga says:

    Why don't you change the implementation to:

    return (string x, string y) => comparison(y, x);

    ?

  3. Jon Skeet says:

    @Brian and @Patrick: You can. It's just that that doesn't *appear* immediately to be any better than returning the result of negating the original call. You basically need to remember the int.MinValue problem. Once you know it, it's easy to remember it… but I suspect that relatively few people who hadn't already come across it would pick that up in code review.

  4. Jon Skeet says:

    (Oops, hadn't spotted the longer implementation at the end of the post.)

    I would suggest that the "comparison(y, x)" implementation works in any *sane* implementation of IComparer<T>. No doubt you could produce weird implementations (e.g. ones that favoured the first argument somehow) but I'd argue that at that point all bets are off anyway. I can't think of any guarantees provided by Sort in terms of which value will be used for the first parameter and which for the second.

  5. Tim Goodman says:

    If we generalized Comparison to something that took two different data types then I could see why you'd need the longer version.  E.g. something like Comparison <int,string> where you compare the value of the int to the length of the string.  But I can't think of a great example where such a comparison would be useful.

  6. Jasper Stein says:

    I wonder why you introduced FirstByLength anyway…? It seems to me that Reverse(bizarre) will have the same unwanted behaviour – or am I missing something?

  7. Mashmagar says:

    I'm assuming the next defect to spot isn't the missing '};' in the last code sample.

    According to Raymond's article that Eric referenced on Thursday: (blogs.msdn.com/…/55408.aspx)

    "…your comparison function needs to follow these rules:

    •Anti-Symmetry: Compare(a, b) has the opposite sign of Compare(b, a), where 0 is considered to be its own opposite. "

    So if we can make that assumption about the comparison function, we should be able to reverse it by reversing arguments. If we can't make that assumption, I agree with Jon, "all bets are off anyway."

    This is of course ignoring the fact that Eric's implementation discards any information in the return value of the inner comparison other than the sign. I'll leave it to others to decide whether that's a good thing or a bad thing.

  8. carlos says:

    @Tim Goodman: A comparison on two different types is useful when you want to update one sorted list using another sorted list of updates.  You need to compare elements from the two lists to do a merge, but update objects won't necessarily be of the same type as the objects being updated.

  9. pete.d says:

    "I would suggest that the "comparison(y, x)" implementation works in any *sane* implementation of IComparer<T>"

    I suspect that a not-uncommon way to implement the Comparison<T> method is, when given input that is either numerical or easily mapped to something numerical, to just return the result of subtracting the second input from the first.

    While I would just delegate to the numerical type's own CompareTo() method, a) that might not always be possible (for custom numerical types that are not as fully implemented as the built-in types), and b) I can't say it's technically insane to use subtraction instead (in fact, in a micro-optimization sort of way it could be more efficient in some cases, trading the math for a method call).

    And subtraction makes even more sense if the difference calculation somehow just falls out of the data type naturally without an actual project to a numerical data type (not that I can think of any valid examples of that at the moment…but I don't feel comfortable ruling them out at the moment :) ).

    At least, this is what I came up with when I starting thinking about why the Comparison<T> (and related comparison APIs) are defined this way, rather than just requiring -1, 0, and 1 as return values in the first place.

  10. Anthony P says:

    @pete.d, consider subtraction and these results. Comparison<int> comparer = (a, b) => a – b; int x = comparer(3, 4); int y = comparer(int.MinValue, int.MaxValue); What's x? What's y?

  11. Weeble says:

    I don't know if it's a "defect" per se, but composing comparisons like this is going to become particularly unwieldy as soon as you want to sort *first* by *descending* size and *then* in ascending order of some other comparison function. It seems it would be cleaner not to clutter up our length comparison function with comparison composition functionality, and instead write a separate function to compose an arbitrary number of comparisons. (Or if we're really crazy on the functional programming to write a function to compose two comparisons and then apply a fold to it.) Then instead of something nasty like:

    var comparison = Reverse(FirstByLength(Reverse(CompareAlphabetic)));

    we can write more naturally:

    var comparison = Compose(Reverse(CompareByLength), CompareAlphabetic);

  12. @Weeble says:

    I think using extension methods on the Comparison<T> type would probably be a cleaner way to compose, and would more closely match the LINQ syntax people are used to:

    var comparison = CompareByLength.Reverse().ThenBy(CompareAlphabetic);

  13. pete.d says:

    @Anthony P: "int y = comparer(int.MinValue, int.MaxValue); What's x? What's y?"

    You are missing the point. I'm not talking about a general purpose comparison function. I'm talking about a scenario where the inputs are known to work fine for subtraction.

  14. Denis says:

    Now, wait a sec: "once they are sorted by length (!!!), sort each group that is the same length by some other comparison," isn't that what we ar supposed to do? If so, then why are we sorting by "some other comparison" within the same function that sorts by length, which is clearly before they are sorted by length??? Maybe my math is bad; maybe the result will be the same… but my gut feeling, plus a bit of military background that says "do exactly what you are told to do", are against this approach. Sorry if it does not sound very convincing…

  15. Anthony P says:

    @Denis, he's saying this: sort by length, let another comparison be a tiebreaker for matching lengths. Perhaps it's a straight alphabetic sort, maybe it's by the number of vowels. In SQL, it would be Order By Length, Foo.

  16. Jon Skeet says:

    Just in case anyone was thinking this kind of mistake would never make its way into production code, it has. The three implementations of LINQ to Objects which I suspect are most widely used (the one in .NET, Mono's implementation, and LinqBridge) *all* have bugs around OrderByDescending due to this. Joy.

  17. Anthony P says:

    @Jon, haha, wow. Yeah, I reproduced it. For those playing at home, use the overload that accepts an IComparer<T> implementation. Have it return int.MinValue where you would normally return -1, and experience the joy.

  18. Alan McF says:

    Do any of the VS project templates enable checked code in the Debug build? (C# Build->Advanced->"Check for arithmetic overflow/underflow")  I feel that they should to help catch such errors.  Hmm does VB enable 'checked' by default in all builds? (Compile->Advanced->"Remove integer overflow checks")

    I deal with bit parsing etc a lot so have learned to add checked and unchecked blocks/operators to my code but I guess they don't occur to most people.

  19. Amnon says:

    I don't know if it's relevant, but it's the only documentation I found on MSDN regarding constraints on comparison functions:

    """Notes to Implementers

    Comparing null with any reference type is allowed and does not generate an exception. A null reference is considered to be less than any reference that is not null.

    """

    (Regarding Comparer<T>.Compare).

    The Reverse method doesn't preserve this property (although I can't imagine where it would matter).

  20. pete.d says:

    "The Reverse method doesn't preserve this property (although I can't imagine where it would matter)."

    I'm not sure what you mean. It is true that after reversing a comparison that meets that criteria, the new comparison no longer does. But that seems "by design" to me. After all, if null objects remained at the beginning of the sort order after reversing, it seems to me you haven't _truly_ reversed the original sort order completely.

    I can see good arguments both ways, but at the end of the day it seems to me that the most common usage for a Reverse() composition is when you literally want the entire original sort order reversed. It won't comply strictly to the interface docs, but I think that's more because of an oversight in the docs, than because Reverse() is broken.

  21. Amnon says:

    You're right. The problem is probably in this criterion, since it doesn't allow you to reverse lists that contain non null references.