To box or not to Box? That is the question!


Discussions on reddit, hacker news.

Recently I've noticed that the Equal method from our ValueTuple (*) struct generates significant memory traffic (~1Gb). That was a bit of a surprise to me. This struct is well designed and was used pretty heavily in many performance critical scenarios. Here how the struct looks like:

public struct ValueTuple<TItem1, TItem2> : IEquatable<ValueTuple<TItem1, TItem2>>
{
   
public TItem1 Item1 { get
; }
   
public TItem2 Item2 { get
; }

   
public ValueTuple(TItem1 item1, TItem2
item2)
        :
this
()
    {
        Item1
=
item1;
        Item2
=
item2;
    }

   
public override int
GetHashCode()
    {
       
// Real implementation is a bit more complicated. Not a simple XOR
        return EqualityComparer<TItem1>.Default.GetHashCode(Item1) ^
            EqualityComparer<TItem2>.Default.
GetHashCode(Item2);
    }

   
public bool Equals(ValueTuple<TItem1, TItem2
> other)
    {
       
return (Item1 != null && Item1.Equals(other.Item1)) &&
                (Item2 != null && Item2.Equals(other.
Item2));
    }

   
public override bool Equals(object
obj)
    {
       
return obj is ValueTuple<TItem1, TItem2> &&
            Equals((ValueTuple<TItem1, TItem2
>)obj);
    }
   
// Other members, like equality operators are omitted for brevity
}

(*) Our ValueTuple was implemented way before the one that was shipped with VS 2017, and our implementation is immutable. So even today we can't switch to the implementation from the .NET Framework.

Nothing special. Methods Equals and GetHashCode are overriden to avoid boxing allocation (more on this later in the post). Null check to avoid NullReferenceExcecption if Item1 or Item2 are reference types. Null check potentially could have caused a boxing allocation if the JIT would have leave a check for null if Item1 or Item2 are structs. But luckily it is smart enough to eliminate the check in this case. Everything is simple and straightforward. Right? Almost.

In our case the memory traffic was generated not for every ValueTuple case, but only for a specific one: HashSet<ValueTuple<int, MyEnum>>. Ok. This helps a bit. Right?

Let's see what will happen when the Equals method is called to compare two instances of ValueTuple<int, MyEnm>> type:

// Compiler's view of the world
public bool Equals(ValueTuple<int, MyEnum
> rhs)
{
   
return Item1.Equals(rhs.Item1) && Item2.Equals(rhs.
Item2);
}

For the Item1 we'd have a call to int.Equals(int), and for Item2 we'd have a call to MyEnum.Equals(MyEnum) method. For the first case nothing special will happen, but for the second case the method call will cause a boxing allocation!

"When" and "Why" boxing happens?

We used to think that boxing happens only when an instance of a value type is explicitly or implicitly converts to an object or to an interface:

int n = 42;
object o = n; // Boxing
IComparable c = n; // Boxing

But the reality is slightly more complicated. The JIT and the CLR will have to box an instance in some other cases: for instance, to invoke a method defined in "reference type" section of a value type.

All custom structs are implicitly sealed and derived from the special class: System.ValueType. All value types have "value semantic" meaning that the object’s default referential equality is no longer suitable for them. To enforce the new semantic System.ValueType provides a special implementation for two methods: GetHashCode and Equals.

But default implementation has two problems: the performance is very bad (**) because it may use reflection and boxing will happen for every method invocation.

(**) The performance of the default implementation of ValueType.Equals and ValueType.GetHashCode can vary significantly based on the structure of a given value type. If the struct doesn't have pointers and is "packed" correctly, then bit-wise comparison is possible. Otherwise, reflection will be used that will cause a drastic performance degradation. See CanCompareBits implementation at the coreclr repo.

The first issue is pretty well-known but the other one is more subtle: if the struct does not override a method, boxing will happen!

 

struct MyStruct
{
   
public int N { get
; }

   
public override int GetHashCode() => N.
GetHashCode();

   
public override bool Equals(object
obj)
    {
       
return obj is MyStruct && Equals((MyStruct
)obj);
    }

   
public bool Equals(MyStruct other) => N == other.
N;
}


var myStruct = new MyStruct
();

// no boxing: MyStruct overrides GetHashCode
var hc = myStruct.
GetHashCode();

// no boxing: MyStruct overrides Equals
var equality = myStruct.
Equals(myStruct);

// boxing: MyStruct does not override ToString
var s = myStruct.
ToString();

// boxing: GetType is not virtual
var t = myStruct.GetType();

 

In the example above, the boxing will not happen in the first 2 cases, but will happen in the last two. Call to a method, defined in System.ValueType (like ToString, and GetType in this case) will cause boxing and call to an overridden method (like Equals and GetHashCode) – will not.

Now let’s switch back to our example with ValueTuple<int, MyEnum>. User-defined enums are value types without an ability to override GetHashCode and Equals, and this means that every call to MyEnum.GetHashCode or MyEnum.Equals will cause an allocation.

But can we avoid it? Yes, by using EqualityComparer<T>.Default instance.

How EqualityComparer avoids allocations?

Let’s get slightly simplified example and compare two ways of comparing enum values (pun intended): using Equals method and using EqualityComparer<T>.Default:

var s1 = new ValueTuple<int, MyEnum>(42, MyEnum.Foo);
var s2 = new ValueTuple<int, MyEnum>(42, MyEnum.
Bar);

// Will cause a boxing allocation
bool r = s1.Equals(s2);

Let’s use BenchmarkDotNet to prove that the first case causes an allocation and the other one does not (to avoid iterator allocation I’m using a simple foreach loop and not something like Enumerable.Any or Enumerable.Contains):

[MemoryDiagnoser]
public class EnumComparisonBenchmark
{
   
public MyEnum[] values =
 
           
Enumerable.Range(1, 1_000_000
)
.Select(n => MyEnum.Foo).
ToArray();

   
public
EnumComparisonBenchmark()
    {
        values[values
.Length - 1] = MyEnum.
Bar;
    }

    [
Benchmark
]
   
public bool
UsingEquals()
    {
       
foreach(var n in
values)
        {
           
if (n.Equals(MyEnum.Bar)) return true
;
        }
       
return false
;
    }

    [
Benchmark
]
   
public bool
UsingEqualityComparer()
    {
           
var comparer = EqualityComparer<MyEnum>.
Default;
       
foreach (var n in
values)
        {
           
if (comparer.Equals(n, MyEnum.Bar)) return true
;
        }
       
return false
;
    }
}

 

                Method |      Mean |      Gen 0 |  Allocated |
---------------------- |----------:|-----------:|-----------:|
           UsingEquals | 13.300 ms | 15195.9459 | 48000597 B |
UsingEqualityComparer |  4.659 ms |          - |       58 B |

As we can see, the regular Equals call causes a lot of allocations for no reason. Moreover, EqualityComparer is way faster, although in my case I haven’t seen any difference after I’ve switched the implementation to use EqualityComparer. The main question is: how does EqualityComparer do this?

EqualityComparer is an abstract class that provides the most suitable comparer based on a given type argument via EqualityComparer<T>.Default property. Main logic resides in method ComparerHelpers.CreateDefaultEqualityComparer and in case of enums it delegates it to another helper method – TryCreateEnumEqualityComparer. The last method then checks the underlying type of the enum and creates special comparer instance that does some nasty tricks:

[Serializable]
internal class EnumEqualityComparer<T
>
:
EqualityComparer<T> where T : struct
{
    [
Pure
]
   
public override bool Equals(T x, T
y)
    {
       
int x_final = System.Runtime.
CompilerServices
.JitHelpers.
UnsafeEnumCast(x);
       
int y_final = System.Runtime.
CompilerServices
.JitHelpers.
UnsafeEnumCast(y);
       
return x_final ==
y_final;
    }

    [
Pure
]
   
public override int GetHashCode(T
obj)
    {
       
int x_final = System.Runtime.
CompilerServices
.JitHelpers.
UnsafeEnumCast(obj);
       
return x_final.
GetHashCode();
    }
}

EnumEqualityComparer converts a nenum instance to its underlying numeric value using JitHelpers.UnsafeEnumCast with a following comparison of two numbers.

So, what was the final fix?

The fix was very simple: instead of comparing values using Item1.Equals we’ve switched to EqualityComparer<T>.Default.Equals(Item1, other.Item1).

Additional References

·        Remove usage of EqualityComparer from ValueTuple's GetHashCode.

·        GetHashCode should not box enums in generic methods.

·        Optimize calls to enum’s GetHashCode/Equals.

Comments (5)

  1. OmariO says:

    Btw it has ben changed recently in CoreClr. Calling AEnum.GetHasCode no more allocates

    1. Yep, that's why I've provided a link to a coreclr repo with a pull request that does this: https://github.com/dotnet/corert/pull/2117
      But it is not the case for Equals, because the right hand side is still needs to be allocated.

  2. Hi Sergey.
    Its pretty old and funky trick with struct Equals and GetHashCode optimizations. CLR performs "memcmp based" equals method implementation for structs when:
    +all fields are valuetypes
    +struct size is layouted well (%64)
    +Equals is not overridden

    In that case we achieve ultrafast equals and GetHashCode out of the box. There are some minor issues with "semantic equality" related to float numbers.

    For instance

    public struct Tuple_D_I
    {
    public double Item1;
    public int Item2;

    public static Tuple_D_I Create(double i1, int i2) =>
    new Tuple_D_I
    {
    Item1 = i1,
    Item2 = i2
    };
    }

    public struct Tuple_D_I_I
    {
    public double Item1;
    public int Item2;
    public int Item3;

    public static Tuple_D_I_I Create(double i1, int i2) =>
    new Tuple_D_I_I
    {
    Item1 = i1,
    Item2 = i2
    };
    }

    void Main()
    {
    var di1 = ValueTuple.Create(Tuple_D_I.Create(0.0, 1),1);
    var di2 = ValueTuple.Create(Tuple_D_I.Create(-0.0, 1),1);

    (di1.Equals(di2)).Dump(); //True - as layout is OK

    var dii1 = ValueTuple.Create(Tuple_D_I_I.Create(0.0, 1),1);
    var dii2 = ValueTuple.Create(Tuple_D_I_I.Create(-0.0, 1),1);

    (dii1.Equals(dii2)).Dump(); //False - as layout isnt OK
    }

    The same "surprise" can appear with Enum field. For instance
    public enum MyEnum : short
    {
    Foo,
    Boo
    }

    can break the layout and "reflection based" Equals will be invoked instead of "memcmp" one.

    Thanks for nice theme 🙂

    1. Wrong comments in code above )))

      ...
      (di1.Equals(di2)).Dump(); //FALSE– as layout is OK and memcmp returns false since sign bits are different
      ...
      (dii1.Equals(dii2)).Dump(); //TRUE– as layout isnt OK, reflection based Equals is invoked
      ...

      1. And one more chance )))

        (di1.Equals(di2)).Dump(); //TRUE– as layout isnt OK, reflection based Equals is invoked…
        (dii1.Equals(dii2)).Dump(); //FALSE– as layout is OK and memcmp returns false since sign bits are different

Skip to main content