Is it possible to get java’s ‘==’ semantics in C#?


Prelude:  Here’s what my current array based implementation of ICollection looks like.  It builds upon a lot of suggestions as to the tradeoffs of perf vs. memory and uses delegates to place control in the person who creates the collection for how certain operations behave:



namespace Testing.Collections


{


    using System;


    using Testing.Functional;


 


    public class ArrayCollection<A> : ICollection<A>


    {


        private readonly Authenticator<A> authenticate;


        private A[] array;


        private uint count;


 


        private static readonly Authenticator<A> DefaultAuthenticate = delegate(A a1, A a2)


        {


            return object.Equals(a1, a2);


        };


 


        public ArrayCollection() : this(DefaultAuthenticate) {


        }


 


        public ArrayCollection(Authenticator<A> authenticate)


        {


            this.authenticate = authenticate;


            if (null == authenticate)


            {


                throw new ArgumentNullException(“authenticate”);


            }


        }


 


        #region ICollection<A> Members


 


        public bool Empty {


            get {


                return count == 0;


            }


        }


 


        public bool Add(A element) {


            //From CLR 369 (Dynamic Tables)


            EnsureSpace();


 


            array[count] = element;


            count++;


 


            //Adding an element always modifies an ArrayCollection


            return true;


        }


 


        private void EnsureSpace()


        {


            if (array == null)


            {


                array = new A[2];


            }


 


            if (count == array.Length)


            {


                A[] doubledArray = new A[count * 2];


                Array.Copy(array, doubledArray, count);


                array = doubledArray;


            }


        }


 


        public void Clear()


        {


            array = null;


            count = 0;


        }


 


        public bool Contains(A element)


        {


            for (uint i = 0; i < count; i++)


            {


                if (this.authenticate(array[i], element))


                {


                    return true;


                }


            }


 


            return false;


        }


 


        public bool Remove(A element)


        {


            for (uint i = 0; i < count; i++)


            {


                if (this.authenticate(array[i], element))               


                {


                    RemoveAt(i);


                    Contract();


 


                    return true;


                }


            }


 


            return false;


        }


 


        private void Contract()


        {


            //Shrink the array if the number of elements has gone below


            //1/4th the capacity.  See CLR 370 (Dynamic Tables: Table-Delete)


            if (count < (array.Length / 4))


            {


                A[] halvedArray = new A[array.Length / 2];


                Array.Copy(array, halvedArray, count);


            }


        }


 


        private void RemoveAt(uint i)


        {


            //Move the last element into the array into this position


            uint lastItem = count – 1;


            array[i] = array[lastItem];


            array[lastItem] = default(A);


            count–;


        }


 


        #endregion


    }


}


There are small changes to the first version of the code:



  1. I use null to represent an empty array

  2. When getting space initially for the array we default to a size of 2.  This was chosen because it’s the smallest size that the array can reach when you call remove

  3. Equality is based on a delegate passed in the constructor.  If no delegate is passed then we use object.Equals(o1, o2) to test equality.

I’m writing this as a library that I intend people to use and augment.  So I don’t feel a test is complete unless I try subclassing the class to provide a specialized implementation.  In this case I wanted to create an “IdentityCollection”.


My definition of an IdentityCollection was: an ICollection<A> where equality is defined as:



  1. Reference equality if A is a reference type

  2. Structural equality if A is a value type

So if you had checked if an object was in the collection it would only report true if that was in teh same location in memory.  I could do this with the following code:


ICollection<string> collection1 = new ArrayCollection<string>(delegate(string s1, string s2) {


                                                                  return object.ReferenceEquals(s1, s2);


                                                              });


However, this wouldn’t work for value types.  For example, I would expect to be able to type:


collection.Add(4) and then have collection.Contains(4) return true.  However, using ReferenceEquals wouldn’t work because it would automatically box those value type into objects in different locations in memory.


My initial stab at it looked like this:


class IdentityCollection<A> : ArrayCollection<A>


{


    public IdentityCollection() : base(delegate (A a1, A a2) {


        return a1 == a2;


    }) {}


} 



I figured that == would have reference semantics on objects and structural semantics on values.  Pretty intuitive considering that == has structural semantics on all the predefined value types in the system (bool, int, decimal, etc.).  However, when i tried this I got:


 Error 1  Operator ‘==’ cannot be applied to operands of type ‘A’ and ‘A’


Sigh… I’m reading through the language spec right now to understand this.  However, can anyone else think of way to get this done?


Comments (15)

  1. Cyrus, you need to use constraints and CompareTo. You can specify a constraint that any Type Parameter implement IComparable or IComparable<T>

    class IdentityCollection<A> where A:IComparable<A> : ArrayCollection<A>

    {

    public IdentityCollection() : base(delegate (A a1, A a2) {

    return (a1.CompareTo(a2) == 0);

    }) {}

    }

  2. jaybaz [MS] says:

    Why is your top level namespace named "Testing"? I would choose "MS.CyrusN".

  3. Justin: That’s a constraint that I don’t want to have on my collection. That would exclude many structs from being allowed in.

    I woudl like my collections to operate on any type.

  4. Jay: No reason. At this point they don’t really need to be in a namespace at all. But that’s what the default project gave me so I stuck with it.

  5. One thing you could do is to take a look at how the STL C++ library is designed. An idea that comes to mind would be to take the operations that require equality of elements away from the container class into separate classes.

    E.g., with STL, the basic list has no ‘contains’ method. Instead, there is a separate ‘find’ algorithm that you can apply to the list. The ‘find’ algorithm requires that the list’s elements are comparable with the ‘==’ operator – if not, an attempt to apply ‘find’ to a list fails at compile time. Maybe you could do something similar here?

  6. Here’s how you can do it:

    class IdentityCollection<A> : ArrayCollection<A>

    {

    public IdentityCollection() : base(delegate (A a1, A a2) {

    if((object)A.default != null) {

    return object.Equals(o1, o2);

    }

    return object.ReferenceEquals(o1, o2);

    }) {}

    }

    Unfortunately, this requires boxing for value types.

    I’m not sure if your definition of structural equality includes using a user defined override for Object.Equals, if it doesn’t you can use:

    return System.Runtime.CompilerServices.RuntimeHelpers.Equals(o1, o2);

    instead of object.Equals(). That will do a memcmp style comparison of the two values (but still requires boxing).

    The CLR really should have an IL instruction to do a bitwise comparison of two values.

  7. David says:

    BTW, in Contract(), do you want to point array at halvedArray?

  8. AT says:

    1. EnsureSpace

    a) EnsureSpace must not rely on count value but take expected size instead.

    I.e. in case if I wish to subclass your ArrayCollection to add for example method AddAll(params A[] items) then I will have to reimplement EnsureSpace (I prefer you mark it protected or even public) with increase in sizes different that +1

    b) In case if you will need a huge size of variables very close to int.MaxValue (or to be clear – at least 1073741824) then your array.length*2 code does not work and will cause wrong (even negative) length array created to store values ;o))

    This is not relevant for IA32 architecture as you will more likely hit 3Gb memory limit – but will definitely cause troubles for 64-bits.

    c) You can merge Contract with EnsureSpace and this will be one-place stop for array reallocations.

    2. Your RemoveAt and EnsureSpace are private. It will be nice to mark them protected or even public if you wish to allow developers to extend your class. If do not wish to allow them to do so – use sealed class – this will result in some optimization.

    If you will make RemoveAt protected or public – apply some basic param validation.

    If using RemoveAt with position bigger that count but less that array.length this result last element simply lost. All others results in IndexOutOfRange exception

    Param validation will not hurt as it can be optimized/removed by compiler.

    3. You have no some required methods for ICollection<A> from Whidbey CTP build.

    Like a CopyTo(), Count, IsReadOnly and GetEnumerator(). As well Add return bool instead of int.

  9. David: correct. Any suggestions on how I test that? From an external persepctive the colleciton behaved just fine.

  10. Jeroen: I don’t want to box and I don’t want to use .Equals. .Equals might be implemented in a way that doesn’t mean structural equality. Boxing also eliminates the perf benefits i get out of using generics here.

    == in Java doesn’t box and it works on Reference/Value types just fine. That’s what I’m looking for here.

  11. AT:

    b) Correct. I was looking at Array.LongLength and thinking I should move to that. I should also put that code in a checked block.

    c) What is the benefit of merging them into one location?

    2. I see extensibility coming from overriding the things that are in the public signature in the class (i.e. the interface). The private methods are an implementation detail that might change at any time. For example, merging Contract/EnsureSpace as you mentioned.

    RemoveAt is private. I could add validation, but the guidelines say that is unnecessary as I control all flow into that method.

    3. I’m not attempting to clone System.Collection.Generic.ICollection. I’m trying to write my own. Prefereably without the issues I’ve found int the BCL libraries. As I mentioned in a previous posts, I’m not sure if Count is appropriate on a collection, and a later post talks about GetEnumerator.

  12. AT says:

    Cyrus:

    2. You must allow subclasses to call limited set of your class "private" members so users will be able to add new major functionality (example AddAll, RemoveAll, ContainsAny, ContainsAll from more powerfull interface for example IBulkOperationsCollection ) easily.

    Or if you do not like this – simply disallow this using sealed modifier

    As for validation – it will be good as you will be able to detect errors early. You can instroduce invalid inputs during future development.

    Most likely compiler will optimize it for you and will have no performance hit for this. If you still worry about performance – use conditional compilation.

    3. Sorry. My fault.

  13. AT: Thanks for your feedback. It’s excellent. As an experiment could you try to subclass ArrayCollection and then show the issues you have doing it. Then show how making the modifcations you’ve talked about will alleviate them.

    I believe that there is a different way to accomplish what you want while keeping the patterms I have introduced, but I would love to see yours so that I can compare/contrast them. Thanks much!

  14. AT: As to validation. That’s what tests are for :-)

    It seems like a very bad smell to be adding validation to private methods. They receive no external input and so there should be no chance that they get invalid data. Validation is a crutch that indicates that you’ve done something wrong. Of course, validation on methods exposed publicly is a must.

  15. Cyrus: As to validation, you can enable validation or rather assertions in your debug builds only. In this respect they don’t act as crutch, but rather as a personal check. When running in debug mode, you’ll find possible issues with your code, while in retail mode, you’ll have assumed to find all invalid values in private methods.