IComaprer & IComparable Refactoring Proposal


We are exploring the possiblility of changing the design of IComparer<T> and IComparable<T> interfaces that will ship with Whidbey. This post describes some more detail on the issues we are trying to address with the design change and we are looking for feedback on the proposal.

Background

The first version of the .NET Framework had three main comparison related-interfaces: IComparable, IComparer, and IHashCodeProvider. One of the main issues with the interfaces was that there was no single interface that could be used by Hashtable to override the default identity of the keys (change which keys compare to equal and which not). You needed to use IHashCodeProvider to provide custom hash codes for the keys and IComparer to provide custom equality (via Compare==0). This resulted in a common usage mistake (a bug) where a user would pass incompatible pair of IHashCodeProvider and IComparer to the Hashtable constructor. For example, a case sensitive hash code provider and a case insensitive comparer. This would result in a bug where two strings that do compare equal could have different hash codes, and so the Hashtable indexer could not find the item with the given key.

We tried to fix this problem in Whidbey generic APIa and decided to put all comparison related methods from IComarer and IHashCodeProvider into a single interface (IComparer<T>), so users could not pass two incompatible implementations of the interfaces to the Hashtable constructor. We have always seen the new design as a compromise between a very clean OO design and trying to avoid explosion of the number of interfaces.

Well, we have received feedback from people using the new interfaces (IComparer<T> and IComparable<T>) that made us thinking that may be we compromised a bit too much J Here are the relevant links describing the issues:

We decided to give it one more shot to fix the concerns. You can find the proposal for the new design below. I think the change proposal addresses most of the concerns, but at the same time it has the following drawbacks:

  • More interfaces to understand. Makes the Framework less approachable.
  • Custom comparers can implement one interface (let’s say IEqualityProvider<T>) but not the other (IComparer<T>). Switching from one collection which uses IEqualityProvider<T> to another which happens to use IComparer<T> may be difficult (as you will need a different custom comparer).

 

I am looking for feedback on the proposal from people who have experimented with the interfaces. The change is something that we would do only if there is an overwhelming positive feedback from the community, not if the improvement seems only marginal.

Thanks!

 

Current Design

public interface IComparable<T> {

    bool Equals(T other);

    int CompareTo(T other);

}

 

public interface IComparer<T> {

    bool Equals(T x, T y);

int Compare(T x, T y);  

int GetHashCode(T obj);

}

 

public interface IKeyComparer : IComparer, IHashCodeProvider {

    bool Equals(object x, object y);

}

Proposed New Design

IComparable<T>

·         Break into two interfaces

public interface IEquateable<T> {

bool Equals(T other);

}

 

public interface IComparable<T> {

    int CompareTo(T other);

}

IComparer<T>

·         Break into two interfaces

public interface IEqualityProvider<T> {

    bool Equals(T x, T y); 

int GetHashCode(T obj);

}

 

public interface IComparer<T> {

int Compare(T x, T y);

}

IKeyComparer

·         Rename from IKeyComparer to IHashComparer

·         Remove IComparer requirement (base type)

public interface IEqualityProvider : IHashCodeProvider {

    bool Equals(object x, object y);

}

 

Comments (23)

  1. Jesse Ezell says:

    Yes. This is a very good idea.

  2. Jesse Ezell says:

    PS:

    "More interfaces to understand. Makes the Framework less approachable." isn’t a valid concern. You are not really adding any surface area to the framework. If anything, this makes the framework more approachable, because you only have to implement the functionality you need, rather than writing a bunch of hash or compare code that you are never going to use. Someone who is just doing a simple compare operation might not have any idea how to implement a good hashing strat, which sucks if they really don’t need that functionality anyway and waste a bunch of time just to make sure their interface returns valid data if someone else should happen to come along and use it with a hash table. In fact, leaving this the way it is currently would probably result in a bunch of broken comparer classes, because a lot of people wouldn’t bother to implement the stuff they aren’t using.

  3. damien morton says:

    The splitting seems to be a good approach, keeping ordering and equality/hashing semantics separate.

    Maybe make the naming more explicit though

    interface IEqualityProvider<T>

    {

    bool Equals(T x, T y);

    int GetHashCode(T x);

    }

    interface IOrderingProvider<T>

    {

    int Compare(T x, T y);

    }

    or even IHashComparer<T> and ISortComparer<T>

  4. Matthew W. Jackson says:

    Yes this is good. Something that is equatable isn’t always comparable. A complex number needs Equals(Complex), but there’s little need for CompareTo(Complex).

    What’s worse, having equality/comparison on the same interface implied that they did the same thing (Equals==true when Compare==0), and you guys went so far as to say they had inter-related contracts that enforced the same thing.

    But this contract was broken right out of the gate with String, so I think equality needs to be separated from comparison.

    Also, there are MANY times when I want a class’s sorting (comparison) to have nothing to do with equality. I may have a Customer object, and I want this to use reference equality (every instance is only equal to itself), but it’d be nice if a Customer would sort by its LastName property by default. Under the current interfaces and "inter-related contract," this is difficult to do and still feel good about myself at the end of the day. If I can implement IComparable<Customer> but not IEquatable<Customer>, falling back on reference quality and hashing (boxing would not be an issue in this case), then I feel good about my design decisions.

  5. Jouko Kynsijärvi says:

    Yes. Do the split!

    (But since i opened the MSDN feedback thread mentioned, i’ll guess you already know what i feel about this topic…)

  6. damien morton says:

    BTW – your stylesheet works badly in Firefox. When I mouse over any text on the page, the text dissapears.

  7. Matthew W. Jackson says:

    Sorry for taking this off topic, as I don’t want to detract from this important discussion, but…

    The "mouse hover" bug with this website doesn’t appear to be a CSS bug. According to the DOM Inspector (which IE doesn’t have *grin*), a whole section of the website is inside an "a" tag.

    Maybe IE interprets junk-html differently than Gecko. That’s the problem with tag-soup–there is no ‘standard’ for interpreting bad markup.

    But please consider doing something about your website. Sorry again for the off-topic comment.

  8. Matthew W. Jackson says:

    Sorry again… I found the exact culprit. Your anchor for the comments is ‘<a name =" feedback" />’ which should have a full ‘</a>’ rather than shorthand since you’re serving HTML and not XHTML. This is causing all of the comments to be inside an "a", and since you are styling a:hover instead of a:link:hover, everything goes screwball in Gecko.

  9. Mads Houmann says:

    A very good idea – and there really is no *explosion* of interfaces.

    Just two interfaces to consider when implementing a normal class, and three interfaces when implementing provider-type classes for ordering or hash code generation. Also, the last three can safely be ignored in many cases.

  10. Curt Hagenlocher says:

    I like the name IOrderingProvider as well. It’s far less ambiguous than IComparable. I suppose that tradition demands that crazy "int" return type instead of some kind of (LessThan, EqualTo, GreaterThan) enumeration.

  11. Joe White says:

    I’ll definitely vote in favor of this change. I’m still on 1.1, and I’m very much looking forward to an interface that lets a third party say "yes, those are equal, no they’re not, here’s the hash code, and don’t ask me about ordering". I was disappointed to find out that was on IComparable, because most of my comparers would just have to throw NotSupportedException from the Compare method, which would do bad things to code usability.

    I agree that the number of interfaces is -not- a downside. If you have one interface, but a third of the implementers throw NotSupportedException on half the methods, and another third throw on the other half of the methods, and only a few classes actually implement the whole interface, that’s a hint that you need to partition the interface. Partitioning this guy will make things more usable, by making the code much more intention-revealing.

    Yes, it would take more work to convert from one to the other (if that ever comes up), but (a) it won’t likely happen too often, as they’re used for different things, and if it does come up it’ll probably be very early in the development cycle; and (b) it’s still much better than using a class just because it implements IComparer, and not finding out until runtime that it can’t do what you need.

    On IEquateable: Please take out the extra "e". Make it "IEquatable".

    I’ll also say that I prefer IOrderingProvider instead of IComparable, but it’s not a strong preference – if you left it as IComparable, I wouldn’t be too sad.

  12. I totally like this change.

  13. Robert Bullen says:

    I’m in favor of the proposed new design.

  14. I like the new proposed design.

  15. Cal Miller says:

    Go for it! It’s better to fix something like this now rather than after a lot of code is written that could have benefited from the change. Thanks for asking!

  16. Samuel Tesla says:

    I’ll add my Me Too to the stack.

    I also agree with Joe that you should take the extra e out of IEquateable so that it is just IEquatable, the extra e just looks wrong.

    I also agree that if you’re going to have IEqualityProvider, the name for IComparer should be consistent. IOrderingProvider is consistent, and furthermore, is more intention revealing. If you think about it, "comparing" could be either looking at equality or ordering, which is where the original confusion came from.

  17. Fullmetal says:

    YES, I agree to this proposal, there is no significant *explosion* it’s only a logical distinction, it’s clearer and therefore less prone to bugs&frustration.

    Like Cal Miller, I thank you for asking: this gives me the idea of some collaboration, and collaboration make the users love the final product! It’s good commercial practice.

  18. Peter Rust says:

    YES, please refactor the interfaces. Peter Golde’s reasons are compelling. I would argue that NotSupportedExceptions should only be required for exceptional circumstances (implementing an ordered collection or hash table is not exceptional). This is awkward not only for those who build the collections, also for those who consume them and try to use a method that exists but is "Not Supported".

  19. damien morton says:

    Just a quick note to say that Im not sure that there is necessarily an implicit contract between Equals and CompareTo, while there is definately an implicit contract between Equals and GetHashCode.

    I cant think of any algorithm that will mix the use of Equals and CompareTo – although I can imagine that CompareTo might use Object.ReferenceEquals as a quick test for returning 0.

  20. Shital Shah says:

    1. Its absolutely logical to keep equality functionality seperate, but then IComparable should and must inherit from IEquatable.

    2. The IEqualityProvider is plainly an ugly name and also not consistant with its partner IEquatable. I guess there should be a design pattern to avoid suffixes like "Provider" in to names. Even in 1.0, IComaprer was an unfortunate name which drove most users to MSDN and lookup what the hack it meant. My suggestion for names of 4 of these interfaces:

    IComparable

    IComparablePair

    IEquatable

    IEquatablePair

    To maintain consistancy,

    IComparablePair:IEquatablePair

    IComparable:IEquatable.

    3. Despite the reasons you have specified, it just doesn’t make sense to include GetHashCode in your IEqualityProvider. Its not only incosistant to have this single argument method in this interface but I can imagine several scenarios where I don’t want to worry about hash codes when implementing IEquatable or its pair version. It will be really confusing to several novice users what this "hash" thing even means let alone impementing it correctly when all they want to do is just expose euality interface.

    The current design as it, in my view is ugly (a bad hybrid thats usually born when lot of heads are talking and you have to arrive at some conclusion to satiesfy them all) and incosistance but still a stp forward in right direction.

  21. damien morton says:

    Shital,

    Im not sure that IComparablePair is any more consistent than IEquatable or IEqualityProvider. Clearly, there is room for improvement, however.

    I dont agree that IEqualityProvider can omit GetHashCode, but then I cant imagine that a novice would ever use the interface. I wonder how many novices use IHashCodeProvider or even IComparer, rather than simply implementing one or more of Equals, CompareTo and GetHashCode.

    IComparer, IHashCodeProvider and any of the proposed interfaces are advanced concepts that enable the notion of equality and ordering to change for identical objects, depending on what container they are put into. That is an advanced concept.

    There are several kinds of containers – hash-based containers (Hashtable), sort-based containers (Tree, SortedList), and possibly also pure-equality based containers (such as ArrayList or LinkedList). I would argue that a non-hash, non-sort based container is inefficient and unlikely to be used by the kinds of advanced users who might leverage the power of IComparer and IHashCodeProvider, and therefore can be omitted.

    So, given that we need classes that support the notion of identity in hash-based collections and sort-based collections, whats needed?

    IHashComparer, which has an Equals and GetHashCode

    ISortComparer, which has Compare

    We could also have IComparer : IHashComparer, ISortComparer, and there the contract between Equals and Compare would have to be respected.

  22. I think I’d prefer the way it is now.