Writing a sort comparison function, redux

Prerequisites: Experience with sort functions such as qsort.

Overqualifications: Familiarity with sort algorithms.

If you break the rules for sort comparison functions, then don't expect the result of a sort call to be meaningful. In fact, if you really mess it up, you can corrupt memory or go into an infinite loop.

My program hangs when I used this sort comparison function (pseudocode):

int compare(x, y)
 return x >y ? +1 : -1;

What's even more strange is that sometimes instead of hanging, the program crashes. It all works correctly if I add one line:

int compare2(x, y)
 if (x == y) return 0; // added
 return x >y ? +1 : -1;

What's going on? The array I am sorting contains no duplicate elements, so the two items x and y should never be equal.

First of all, your first comparison function clearly violates the requirements for being a comparison function: It must return zero if the two items compare equal. As a result, the behavior is undefined, and hanging, crashing, or returning an incorrect result are all legitimate behavior.

But let's dig a little deeper to see why hanging and crashing are plausible responses to an invalid sort comparison function.

A common sorting algorithm is quicksort. I'll leave you to go off and learn on your own how the quicksort algorithm works. Come back when you're done.

Okay, welcome back. The central step in quicksort is the partition, and some variants of the quicksort algorithm rely on the partition element comparing equal to itself in order to remove a comparison from the inner loop. For example, an index may walk through the array looking for an element greater than or equal to the partition, fully expecting at least one such element to be found because in the worst case, it will find the partition itself. But if your comparison function fails to report the partition as equal to itself, this search may run off the end of the array and eventually crash with an access violation when it reaches invalid memory.

That explains the crash, but what about the hang? Well, notice that the comparison function is also inconsistent. In particular, the anti-symmetry rule is violated: compare(x, y) and compare(y, x) return the same value (as opposed to the opposite value) if x==y. The function returns -1 both times, saying "x is less than y" and "y is less than x" simultaneously. This inconsistency can easily send a sort algorithm running back and forth trying to find the right place for the partition.

The moral of the story is the same: Your comparison function must meet the criteria for a proper comparison function. If you fail to do this, then the results you get will be very strange indeed.

Comments (47)
  1. Alexander Grigoriev says:

    Just use stl::sort. It’s better than quicksort (its worst case is still n log(n) rather than qs’s n*n). And its comparison function is “less than”, which returns bool.

    [You’re focusing on the details and missing the big picture. What if you’re using a language that STL doesn’t support? (In fact, today’s example came from one such language.) -Raymond]
  2. Nicholas says:

    "Just use stl::sort"

    Yeah, well there are some standard library classes that accept a functor for comparison.  Welcome back to the original problem.

    But, the moral of this story goes well beyond sorting algorithms.  If you violate the contract of an interface then bad things happen.  I see it all the time, both intentional and unintentional.

  3. Tom says:

    Serendipity is a wonderful thing.  It just so happens that I, too, am working on sorting today.  I need to find all possible shortest cyclic subgraphs of a directed graph (not that the graph cannot be acyclic if I’m looking for cycles) and print them out.  I was just looking at Kahn’s topological sorting algorithm, but it only applied to DAGs.  I have an idea how to modify it, though.

    Clearly, using stl::sort is not an option. :)

  4. Maurits says:

    find all possible shortest cyclic subgraphs of a directed graph

    Um… what does that have to do with sorting?  Sounds like an interesting problem, though.

  5. JimLogas says:

    Ah, takes me back to my coding days. A good pred function might be able to help out.

  6. DWalker59 says:

    How to sort:  Permute the set randomly, and then check to see if it’s sorted.  If not, try again.

  7. cooney says:

    how to sort: periodically check to see if the set is sorted. Bit errors will eventually result in a sorted set.

  8. Tom says:


    It is technically a sorting problem with a partial ordering.  Essentially, the nodes in a directed graph can be partially ordered (sorted) such that those node with outgoing edges appear in a list or other data structure before those node into which their edged lead.  Check out <http://en.wikipedia.org/wiki/Topological_sorting&gt; for an example.  It’s true that this this sorting doesn’t actually compare the value of elements, but it is sorting nonetheless.

  9. Aaargh! says:

    how to sort: periodically check to see if the set is sorted. Bit errors will eventually result in a sorted set.

    That takes way too long, Bogosort is a lot quicker: http://en.wikipedia.org/wiki/Bogosort

  10. NUXI says:

    I guess this is supposed to be readable example code, but how can you pass up the chance to layer the ?: operators?

    return x > y ? 1 : x < y ? -1 : 0;

  11. Cooney says:

    That takes way too long, Bogosort is a lot quicker

    Hey, don’t knock proton decay – one day, it’ll be the only energy source we have.

  12. Maurits says:

    VB to the rescue:

    Function Compare(X, Y)

    Select Case X

    Case Is > Y: Compare = 1

    Case Y: Compare = 0

    Case Is < Y: Compare = -1

    End Select

    End Function

  13. Criffer says:

    Simpler, faster, branchless, correct:

    int compare(int x, int y)


    return x – y;


  14. Phaeron says:

    Criffer: compare(0x7FFFFFFF, -0x80000000).

    [And good luck generalizing to objects that don’t support subtraction. -Raymond]
  15. Timothy Byrd says:

    @Alexander – you know there are coders out there capable of writing a "less than" operator that only gives a partial ordering.

    I’ve been playing with Ruby lately, and it uses the spaceship operator for comparisons. I can easily imagine someone writing a comparator that triggers an infinite loop.

    — T

  16. Leo Davidson says:


    ^^^ That "destroy universe" operation doesn’t sound very thread-safe.

  17. David Brooks says:

    When we were developing Motif at OSF, we had a function that would determine a tab order for controls on a form, based on their "reading order" (left to right, top to bottom). The sort predicate was that if the two controls did not overlap in a vertical direction, they were sorted vertically, otherwise in the order of their X-coordinate. This was intended to return the best result when the controls were not nicely arranged on a grid but just jumbled about.

    Motif (and the rest of the X stack) being OS-independent, this code gave reasonable results on HP-UX and Ultrix, but crashed very occasionally on Solaris.

    It took us a couple of weeks to get to the bottom of it. It is possible to arrange three controls so that, according to this algorithm, A < B < C < A. Solaris’s sort implementation was sufficiently different from the others that it crashed with this combination of results (probably a SVR4/BSD difference). The others managed at least to do something.

  18. Duke of New York says:

    The person who asked today’s question is the kind of person who substitutes salt for sugar when cooking.

  19. Joseph Koss says:


    A clever modern-day VB.NET implementation would be:

    Function Compare(X As whatever, Y As whatever) As Integer

     Return(If(X > Y, 1, Int(X < Y)))

    End Function

    This leverages the "new" VB.NET 2008 ternary intrinsic function .. works similar to how IIF() did in previous VB versions, but is now inlined and performs short-circuiting.

  20. SDiZ says:

    @ Criffer:

    int compare(int x, int y) {

     return x – y;

    try x=MAX_INT, y=-1 and see it overflow.

  21. Erzengel says:

    I would love to have had this person walk up to my office and ask me that question.

    “The array I am sorting contains no duplicate elements, so the two items x and y should never be equal.”

    “Ever think that maybe it compares an element with itself?”

    Cue face-palm.

    The “operate an element with itself” is an oft-overlooked part of many operators. Particularly operator=. Then people have the gall to ask, “Why would someone set an instance to itself?” or “Why would someone compare an instance to itself?”

    Because functions don’t always know that two pointers point to the same instance, maybe?

    [>Cue face-palm

    More likely, you’d get the response, “Why would it do that? What idiot sorts by comparing an element with itself?” -Raymond


  22. Falcon says:

    @Criffer, Phaeron:

    I was going to write about the possibility of comparing using subtraction (restricted to ints, as Raymond’s response pointed out), but thinking through it, I realised that overflow could mess things up.

    As many of you probably know, the processor (x86, at least) compares integers by performing a subtraction, setting flags according to the result and throwing the result away. However, the conditional branch instructions account for overflow.

  23. Roman says:

    The standard C++ sort is only guaranteed the average of O(n*ln n), not the worst case.

  24. Jason Haley says:

    Interesting Finds: May 9, 2009

  25. Erzengel says:

    “Why would it do that? What idiot sorts by comparing an element with itself?”

    Cue me executing a Picard double face-palm.

    After which I’d ask, “And how do you propose a function knows that an element IS itself without a comparison?”

    [Duh. if (i == j) result = 0; else result = comparator(a[i], a[j]);. What idiot compares an item to itself? -Raymond]
  26. Mark says:

    This reminds me of one of my comp. sci assignments: We had to implement several sorting algorithms, and for bonus points, display the  operation of each algorithm graphically.

    Viewing the various sorting algorithms running was quite fun (and made the way they work really apparent). Quicksort was not much fun to watch, but heapsort was cool.

    Oh, and since I was into R.E.M. at the time, I called my program "A Carnival of Sorts". :)

  27. Alin says:

    Well, I think that the x<y (without x==y) is pretty good to speed up things. If I need to sort the 3,5,2,4,1,2 array, the result should be 1,2,2,3,4,5. Who cares that the "first" 2 is behind the "second" 2 or the vice versa. The faster it breaks, the less comparations are made. If the algorithm goes beyond the array’s upper bound, it will sort a stable program from a buggy program.

  28. AndyC says:


    You’ve missed the whole point of the post, if you don’t implement the = part, then your ‘sort’ may never finish.

    If 2<2 and 2<2 then which one comes first? How do you know they are sorted?

  29. Falcon says:

    Well, if it is the case that most items being compared will not be equal, you could speed it up and still maintain correct behaviour – just break it up into several tests:

    int compare2(x, y)


    if (x > y) return 1;

    if (x < y) return -1;

    if (x == y) return 0;


    More statements/lines in the source language does not necessarily translate to more machine instructions in the object code, especially when optimizations come into the picture.

  30. someone else says:


    That last if should be just a return. If it’s neither smaller nor larger, then it has to be equal. And if there were a third possibility, the function would return an undefined result.

  31. W says:


    Your function will break for simple floatingpoint numbers. Consider a=NaN and B=Nan

    => none of a<b, a>b and a==b is true

    I think even "someone else"’s function might break down here when supplied with an infinity and a NaN, but I don’t remember the comparison rules for all special cases.

  32. Tom Leys says:

    If you want to see animated sorts, have a look at this : http://www.sorting-algorithms.com/

  33. Yuhong Bao says:

    BTW, on comparisons using subtraction, will it work if the numbers are unsigned instead of signed?

    [I trust my readers are smart enough to figure out the answer to this question themselves. -Raymond]
  34. Toukarin says:

    Insightful. It’s one of those nitty gritty things that are obvious and yet not so obvious at first sight. Had someone asked this question before but I couldn’t really give a complete explanation like this one.

  35. Falcon says:

    @someone else:

    Yes, you’re quite right (for the integer case, at least). Can’t believe I missed that!


    I admit I’m no FP expert, but I can see your point. Just goes to show how important it is to know exactly what you’re dealing with.

  36. mt says:

    “Well, notice that the comparison function is also inconsistent. In particular, the anti-symmetry rule is violated: compare(x, y) and compare(y, x) return the same value”

    Maybe I’m missing it, but I don’t see that. compare(2,1) => 1 while compare(1,2) => -1.

    Is lack of coffee making me miss something?

    [After you finish your coffee, try reading the rest of the sentencce. -Raymond]
  37. DysgraphicProgrammer says:

    Quote: More likely, you’d get the response, “Why would it do that? What idiot sorts by comparing an element with itself?” -Raymond

    I find that thoughtlessness (or temporary stupidity) is far more common then actual persistent idiocy. Usually a small nudge in the right direction resolves that quickly. Maybe I have been luckier than most in my choice of coworkers?

    [In real life, you never compare an item against itself. Bubblesort never compares an item against itself. Many quicksort implementations never compare an item against itself. The idea of comparing an item against itself while sorting is actually quite strange. -Raymond]
  38. Medinoc says:

    A funny part is that there are more than one standards for comparisons:

    * standard C one: int compare(input_type x, input_type y) : Returns 0, or a negative value, or a positive value.

    * standard C++ one: bool less(input_type x, input_type y) : Returns true if x is strictly less than y.

    It gets awkward when people start mixing the two. Notably, MFC uses the C convention, but I often stumble on code that assumes it uses the C++ one…

  39. Alexandre Grigoriev says:


    The standard C++ sort is only guaranteed the

    average of O(n*ln n), not the worst case.

    A Standard C++ Library implementation included with Microsoft Visual C has such worst case guarantee. I don’t think other compiler vendors will settle for anything worse then.

  40. Todd Wilder says:

    I was curious why .NET chose QuickSort for their sorting, because its worst case is n^2 while HeapSort is always (n)Log(n). I heard that other factors come into play, like memory management etc. Any Ideas?

  41. Leo Davidson says:

    Oops: QuickSort does allocate memory, but it’s usually implicit. e.g. Variables in recursive calls.

    When I said it doesn’t require any memory I meant it doesn’t need to make temporary copies of the data itself. It does still need some memory for bookkeeping.

  42. Leo Davidson says:

    @Todd Wilder:

    If you look up QuickSort on Wikipedia it explains the likely reason.

    QuickSort and HeapSort are both in-place (i.e. they don’t need to allocate any temporary storage for the sorting).

    QuickSort’s worst case is worse than HeapSort’s, however QuickSort is still *usually* faster than HeapSort.

    i.e. On average QuickSort is so good that its "bad days" are worth living with.

    Of course, if you know that your data is going to be arranged such that QuickSort will very often be slow then you’d want to use a different algorithm. If you don’t know, though, then QuickSort is likely to be a good bet.

    (While at uni I had the privilege of meeting and attending some lectures by QuickSort’s inventor, C.A.R. Hoare.

    Many of my other tutors & lecturers had done  impressive things, and Hoare himself has done a huge amount of other, more important/complex work, but there was something really cool about sharing tea and biscuits (among with other students in my year) with the person who invented QuickSort. Still the defacto sorting algorithm in just about every library despite being invented in the 60s.)

  43. Worf says:

    There’s always IntroSort (introspective sort). Average runtimes of quicksort, but worst case is still O(nlogn). The trick is it starts via quicksort, but it monitors itself and if it sees it actually has a quicksort taking O(n^2), it stops and goes heapsort.

    Quicksort actually has well-known corner cases that cause it to go O(n^2), and most of them are unlikely to happen if you pay some attention. (I believe one of them is if the list is sorted backwards, in which case, swapping manually would be far faster).

    The only bad thing is that quicksort isn’t stable – two items with the same value may get swapped. It matters if the comparison item is a subset of the entire item (e.g., sort by name an address book – if you have two people with identical names, quicksorting may swap the records around.)

  44. Medinoc says:

    When I want a stable sort, I use a merge-sort: A second buffer with the same size as the first, a copy only if necessary (whether log2(n) is even or odd), and a guaranteed O(nlogn) in all cases, best and worst.

  45. 640k says:

    A quicksort algorithm can be implemented not to swap identical records, leaving them in the original order.

  46. Joseph Koss says:


    There is also an in-place variant of MergeSort() but I am not sure if it offers O(n log n) as an upper-bound.

    It also seems to be quite a bit more complicated based on what little I have read about that variant.

    The standard MergeSort() has the obvious advantage that it is trivial to parallelize, and the symmetry between best and worst case is a very attractive quality when writing for robustness (carefully crafted data to DoS a server using its sort algorithms weakness has been used at least once, targeting the Free Internet Chess Server)

  47. DWalker59 says:

    I have to mention my favorite Syncsort ad again:  From Computerworld magazine, in the heyday of mainframes, the full-page ad featured a computer operations manager claiming that using the Syncsort product allowed them to run their production jobs in 240% less time!

Comments are closed.