Defining Equals on an interface

I was wondering today if ICollection should declare an Equals method on it and state what the semantics of it would be.  This came around because I currently have 3 implementations of ICollection:

  1. ArrayCollection (an ICollection backed by an array)

  2. EmptyCollection (a specialized collection that holds no elements.  Uses a singleton for memory efficiency and specialized implementations of the methods for speed)

  3. SingletonCollection (a specialized collection that holds only one element.  Useful for when you have a single element and you want ot pass it to something that expects a collection).

Now, it seemed reasonable that the following would be true:

new ArrayCollection<int>().Equals(EmptyCollection<int>.Instance) and 

                ICollection<int> c = new ArrayCollection<int>();


                c.Equals(new SingletonCollection<int>(4));

Now, the question is “why are those true”.  It’s intuitive to me that any empty collections should be equal, and any collection with one element in it would be equal to a SingletonCollection with that same element in it.  However, is there a way to formalize that?  The best i could come up with is “Two collections are equal if as you iterate over both of them you find their corresponding elements are equals _and_ you finish iterating over both collections at the same time.”  Expressed in code that would be:

        public override bool Equals(object obj)


            ICollection<A> other = obj as ICollection<A>;

            if (null == other)


                return false;



            IEnumerator<A> e1 = this.GetEnumerator();

            IEnumerator<A> e2 = other.GetEnumerator();


            while (true)


                bool e1MoveNext = e1.MoveNext();

                bool e2MoveNext = e2.MoveNext();


                if (e1MoveNext != e2MoveNext)


                    //they have a different number of elements

                    return false;



                if (e1MoveNext == false)


                    //no more elements in either

                    return true;



                if (!this.authenticate(e1.Current, e2.Current))


                    return false;




That seems to work fine and works within my intuitive definition of how collections are felt of as equal.  The problem I then ran into was that I tried to think down the line about whatever types of collections might be in the system.  Pretty quickly i though about ISet (a collection containing only unique elements).  If I were an ISet i would only consider something equal to me if it contained all my elements, no more, and no duplicate.  Wow.  That’s something that’s woudl be pretty difficult to check.  I’d have to iterate over both checking if my elements were in it and it’s elements were in me.  I’d then have to figure out some way to make sure that it had no duplicates.  That’s a pretty tall order.  (I’m not even sure how I would go ahead checking for duplicates).  The alternative is to place the restriction that the thing I am comparing myself to is also an ISet.  Then I know that the elements must be unique.  I can then check my containment in it and it’s containment in me.  I.e. (this.IsSubsetOf(that) && that.IsSubsetOf(this)).  If I were to eschew infinite sets and have a Count property then I could do: if (this.Count == that.Count && this.IsSubsetOf(that)). (note: taht would also be possible with an IOptional Count property.

However, I face a conundrum.  Say I do: EmptyCollection<int>.Instance(EmptySet<int>.Instance).  I get true based on the rules of Collection equality.  However, if I fo EmptySet<int>.Instance(EmptyCollection<int>.Instance), i get falgs based on the rules of Set equality.

This is a big no-no (i think) as I’ve violated the requirement that a.Equals(b) <==> b.Equals(a).   🙁

Is there a way to reconcile this system?

Has anyone else out there faced this problem of defining equals on interfaces in the presense of sub-interfaces?

Comments (16)

  1. damien morton says:

    You have two problems.

    Firstly, one kind of collection has an overall ordering on it; your ArrayCollection, for example.

    Secondly, the collection without an overall ordering might be particularily efficient at answering containership queries.

    I dont think you can reconcile these things.

    When is an array equal to a tree equal to a hashset? Theres no clear answer to these questions, and these _are_ the fundamental data structures that we build upon.

    Each of these datastructures is a "container" or "collection" to which items can be added or removed, and yet they have very different properties.



    items have ordinal position or not

    items comparable or not

    efficient contains query or not

    iterable in order or not

    duplicates allowed or not

    efficient range query or not

    Heres a thought: If a "container" cant implement a behavior efficiently, it shouldnt implement it at all. This would mean that Contains would be implemented as a static method in helper class that deals with very generic collections.

  2. Damien: I never stated that collections took ordering into account. In fact, I would not consider two collections the same if their ordering was different. That would be a very unintuitive model for me. However, for Sets, I could see this relationship holding. I.e. two sets can be equal regardless of the ordering of Iterate.

    Are you proposing that ICollection _not_ set down rules for equality, and instead those rules should only be on its subinterfaces? (I have no problem with that, I’m just trying to see if that’s what you’re saying).

  3. damien morton says:

    You never defined ordering for collections. You implied it by using an array.

    Im suggesting that the rules for equality, if any, are determined by the details of the collection implementation, which determine which equality algorithms are availble to us, and which is most efficient.

    For example, I have a HashSet class, with union, difference, and intersect operations defined. The arguments can be either ICollection or ISet. In the case of intersect, the approach taken is based on what I know about ICollection (that i can iterate over it efficiently), and what I know about ISet (that I can query its contents efficiently). As a result if my arguments are one ICollection and one ISet, I iterate over the collection testing for membership in the set. If my arguments are two ISets, I iterate over the smaller set testing for membership in the larger one.

    What Im suggesting is that ArrayCollection should not have a Contains method, and that it would be nice to be able to know more about the properties of the sequence returned by the enumerator of ICollection, maybe even simply attributes that tell us Deterministic, Sorted, Ordered, etc etc.

    Youll forgive me if these seem like half-constructed thoughts – they are.

  4. Damien: Not a problem 🙂

    I’m pretty fascinated by every different take on this problem.

    However, could you explain a bit more about why you think that ArrayCollection shouldn’t have a Contains method?

    The idea behind ArrayCollection is to provide an extremely simple implementation of a collection with very little requirements on the client. (in this case all they need to do is provide a way to test for equality).

    I could (and plan to) add things like Hash and Tree based implementations. However, these both:

    a) Add complexity

    b) INcrease the requirements on the client (you need to provides hash codes or ways to order elements).

  5. Damien: I’m interested in your implementation of Union/Intersect/Difference. They seem very costly (O(n)). I wanted to implement the following:

    class Union<A> : ISet<A> {

    private readonly ISet<A> set1;

    private readonly ISet<A> set2;

    public Union(ISet<A> set1, ISet<A> set2) {

    //assign the class fields


    public bool Contains(A element) {

    return set1.Contains(element) || set2.Contains(element);




    This has the benefit of making construction of a union extremely fast, and it is also nice because the original sets are unnaffected. I don’t see Union as an operation on a set, but rather a binary operation that produces a new set. When I think about where I’ve seen unions (in math and CS proofs/algorithms) they’ve always been non destructive. i.e. c = A U B, neither A nor B is modified.

  6. damien morton says:

    I see what you are saying about set operations. Your fundamentaly lazy thinking is coming out 🙂 I was thinking more along the lines of a database hash join operation.

    What happens when you want to enumerate a Union<A> instance?

    I guess it would look something like this:

    (lets first ensure that set1.Count >= set2.Count in the constructor)

    IEnumerable GetEnumerator()


    foreach (A a in set1)

    yield a;

    foreach (A a in set2)

    if (!set1.Contains(a))

    yield a;


    Also, at some point, the complexity of the query and enumeration operations will grow to unmanageable size, and youd want to flatten the expression tree out.

    So maybe in the constructor, you do something like this:

    complexity = set1.Complexity + set2.Complexity

    if (complexity > N)


    this.Flatten(Union(set1, set2));


    the forumula for determining when to flatten would have to be based on access patterns as well. Maybe flatten when complexity*access > N.


    What happens when you want to Add to a Union<A>?

  7. Not lazy, just functional 😛

    Good point about enumerations. I would imagine that people would probably want the uniqueness property satisfied:

    Since i am dealing with immutable sets, the same this would happen when you added: you’d get a new set without affecting that last one.

  8. damien morton says:

    I just realized that something as simple as a Count would involve iterating over your expression-tree type sets and that immutable sets would be extremely inefficient when adding one item at a time.

  9. Damien: Yes. That’s why i am not implementing Count (For now).

    I’m not sure why you think it would be innefficient though. Functional sets are quite effcient even when adding elements one at a time.

  10. damien morton says:

    I dont know how functional sets are implemented efficiently. Perhaps you could enlighten me.

    I was working on a problem similar to this last week, where I wanted to efficiently copy hashsets. It turned out that without reference counting and finalizers, I couldnt find a way of implmenting a copy operation in the most efficient way.

    I wanted to make a copy operation reference the underlying store, which would then be marked as immutable. I also wanted the underlying store to revert to a mutable state when only 1 reference remained. Once a copy was made, writes would be done to a local store, while reads would happen locally with a fallthrough to the lower-level store. As with the Union operation you outlined, the complexity of the expression tree can get out of control, and some kind of flattening strategy would be required.

  11. Damien: I would love to. (It would probably make for a good post too). However, I’m going home now. I will try to get to it soon!

  12. David says:

    Have you looked at Java’s collection package (source or API?)

    I imagine the design decisions there would be of interest to you. Many find the package to be well thought out.

    Or does that raise legal issues?

  13. David: I’m very familiar with teh java collections API, and certainly something I consider when making my decisions. however, my porpose in doing this is not to clone a preexisting package, but rather to create a ewll designed framework from teh ground up where i understand and can justify every design decision that was made.

    A clear example of this w.r.t. java collections is the interplay of ICollection and IMap. At some point I will probably have to cross that bridge or deciding "are they related, or are they separate interface hierarchies". However when I do I really want to have a clear understanding of the benefits/downsides of either choice. If i have that then in the future there will e a clear place for people to turn to if they’re curious "why did you do it that way?".

  14. Scot Boyd says:

    Random thought here: Would it be possible to define Equals() such that it’s only true if the two collections both agree that they are equal?

    Making that check efficient would be a slick trick, and probably doable, but right this minute I can’t think of how.

  15. Scot: Logically, that’s exactly what you’d want.

    A = B <==> A.Equals(B) && B.Equals(A).

    So it might be possible to do in an external utilities class. So:

    CollectionsUtilities.Equals<A>(ICollections<A> c1, ICollection<A> c2) {

    return c1.Equals(c2) && c2.Equals(c1);


    That actually doesn’t sound unreasonable. And it seems to extract the arbitraty equality definition away from the individual collection and into a helper (which you could replace with another helper at any given time).

  16. Scot Boyd says:

    The only problem I see with calling Equals() on each collection twice, is that each collection is iterated over twice. The only way around this I’ve been able to come up with is an interface designed to allow cooperative iteration. So, an interface ICheckEquality would have 3 methods: bool BeginCheck(ICollection<T> collection), void CheckElement(T item), bool EndCheck(T item).

    The third-party method would do something like this:

    CollectionsUtilities.Equals<A>(ICheckEquality c1, ICheckEquality c2)




    //loop to iterate both collections omitted







    //last element in one of the collections

    return c1.EndCheck(c2.Current) && c2.EndCheck(c1.Current);


    I have butchered the Collection syntax and omitted the control statements, but that should illustrate the idea. It’s also probably over-engineering the problem – but it would only iterate each collection once.