To box or not to box, that is the question


Suppose you have an immutable value type that is also disposable. Perhaps it represents some sort of handle.

struct MyHandle : IDisposable
{
    public MyHandle(int handle) : this() { this.Handle = handle; }
    public int Handle { get; private set; }
    public void Dispose()
    {
        Somehow.Close(this.Handle);
    }
}

You might think hey, you know, I’ll decrease my probability of closing the same handle twice by making the struct mutate itself on disposal!

public void Dispose()
{
    if (this.Handle != 0)
      Somehow.Close(this.Handle);
    this.Handle = 0;
}

This should already be raising red flags in your mind. We’re mutating a value type, which we know is dangerous because value types are copied by value; you’re mutating a variable, and different variables can hold copies of the same value. Mutating one variable does not mutate the others, any more than changing one variable that contains 12 changes every variable in the program that also contains 12. But let’s go with it for now.

What does this do?

var m1 = new MyHandle(123);
try
{
  // do something
}
finally
{
    m1.Dispose();
}
// Sanity check
Debug.Assert(m1.Handle == 0);

Everything work out there?

Yep, we’re good. m1 begins its life with Handle set to 123, and after the dispose it is set to zero.

How about this?

var m2 = new MyHandle(123);
try
{
  // do something
}
finally
{
    ((IDisposable)m2).Dispose();
}
// Sanity check
Debug.Assert(m2.Handle == 0);

Does that do the same thing? Surely casting an object to an interface it implements does nothing untoward, right?

.

.

.

.

.

.

.

.

Wrong. This boxes m2. Boxing makes a copy, and it is the copy which is disposed, and therefore the copy which is mutated. m2.Handle stays set to 123.

So what does this do, and why?

var m3 = new MyHandle(123);
using(m3)
{
  // Do something
}
// Sanity check
Debug.Assert(m3.Handle == 0);

.

.

.

.

.

.

.

.

.

.

 

Based on the previous example you probably think that this boxes m3, mutates the box, and therefore the assertion fires, right?

Right?

Is that what you thought?

You’d be perfectly justified in thinking that there is a boxing performed in the finally because that’s what the spec says. The spec says that the “using” statement’s expansion when the expression is a non-nullable value type is

finally
{
  ((IDisposable)resource).Dispose();
}

However, I’m here today to tell you that the disposed resource is in fact not boxed in our implementation of C#. The compiler has an optimization: if it detects that the Dispose method is exposed directly on the value type then it effectively generates a call to

finally
{
  resource.Dispose();
}

without the cast, and therefore without boxing.

Now that you know that, would you like to change your answer? Does the assertion fire? Why or why not?

Give it some careful thought.

.

.

.

.

.

.

.

.

.

.

The assertion still fires, even though there is no boxing. The relevant line of the spec is not the one that says that there’s a boxing cast; that’s a red herring. The relevant bit of the spec is:

A using statement of the form “using (ResourceType resource = expression) statement” corresponds to one of three possible expansions. […] A using statement of the form “using (expression) statement” has the same three possible expansions, but in this case ResourceType is implicitly the compile-time type of the expression, and the resource variable is inaccessible in, and invisible to, the embedded statement.

That is to say, our program fragment is equivalent to:

var m3 = new MyHandle(123);
using(MyHandle invisible = m3)
{
  // Do something
}
// Sanity check
Debug.Assert(m3.Handle == 0);

which is equivalent to

var m3 = new MyHandle(123);
{
  MyHandle invisible = m3;
  try
  {
    // Do something
  }
  finally
  {
    invisible.Dispose(); // No boxing, due to optimization
  }
}
// Sanity check
Debug.Assert(m3.Handle == 0);

It is the invisible copy which is disposed and mutated, not m3.

And that’s why the compiler can get away with not boxing in the finally. The thing that it is not boxing is invisible and inaccessible and therefore there is no way to observe that the boxing was skipped.

Once again the moral of the story is: mutable value types are enough pure evil to turn you all into hermit crabs, and therefore should be avoided.

Comments (59)

  1. Shuggy says:

    foreach and the special structs on the generic collections Enumerators would like a word ๐Ÿ™‚

    At least the expansion of that sugar is reasonable (as in doesn't surprise most people)

  2. Bill P. Godfrey says:

    While totally agreeing with everything you've said, mutable value types could be an argument for adding support for C++ style copy constructors and out-of-scope destructors for C# value types.

  3. Thomas Levesque says:

    I think all readers of your blog now agree that mutable value types are evil… But I wonder why this rule is so often not applied in the .NET Framework classes (DictionaryEntry, GCHandle, Point, Size…)

  4. Jonathan Pryor says:

    @Thomas Levesque: I don't think that "mutable value types are evil" was known in the .NET 1.0 timeframe, at least not as well as we know it now, and all of your examples are from 1.0. A case could be made for e.g. List<T>.Enumerator, though you're not normally supposed to directly access that type, so it's (plausibly) less of an issue.

  5. Anthony P says:

    This self-mutating struct is indeed very nasty, very evil. Let's never speak of it again.

  6. Shuggy says:

    GCHandle's mutability is hidden far away from the average user.

    There are often places that platform designers are doing this that appear evil but are well understood.

    I think Point is a legacy of trying to play along with POINT: msdn.microsoft.com/…/dd162805%28v=vs.85%29.aspx, likewise Size

    It's still quite nasty.

    DictionaryEntry is pure evil

  7. Thomas Levesque says:

    @Jonathan Pryor, you're right of course… but WPF (.NET 3.0) also has its own Point and Size structures, and they're also mutable.

    @Shuggy, yes, I think that's because it's mapped directly to the unmanaged structure…

  8. Gabe says:

    The WPF structs (like Point and Size) have to be mutable because XAML cannot create immutable objects (it can only call default constructors and set properties). The real question is why they are structs, seeing as how they have to be mutable. I'd guess that it's an optimization. You could potentially have millions of Point objects, and performance could really suffer if the memory footprint doubled and each one had to be individually allocated, initialized, and GCed.

  9. Neal Gafter says:

    Re: "Suppose you have an immutable value type that is also disposable".

    Suppose you have an invisible purple dragon.

    Disposing the value changes its state (though not necessarily the bits of the struct).  If its state can change, then it isn't immutable.

  10. mstrobel says:

    @Gabe @Thomas: XAML can create immutable objects just fine (it can create strings and other primitives, after all), but Xaml2006 has no way of explicitly declaring instances of immutable types with non-default constructors.  This renders immutable types with more than one member somewhat useless in XAML.  Your only option is to declare a default TypeConverter such that you could put a string representation in XAML and have it converted to an instance of an immutable type.  Then you could do this:

    <SomeObject Location="(0, 2)" />

    e.g., where 'Location' is an immutable 'Point' with (x, y) members..  Xaml2009 includes support for non-default constructors, but the syntax is relatively verbose, and Xaml2009 wasn't around in .NET 3.0, so we're stuck with some mutable value types.

  11. "foreach" is different – per language spec, it does not expand to using but rather directly to try/finally, and the spec also explicitly says that, for enumerators which are value types, Dispose() is called directly without boxing. The difference is that foreach calls different methods several times on the enumerator, so you can write a perverted implementation that would be able to spot boxing on that final Dispose() call.

  12. snarfblam says:

    I can not agree wholesale with the statement that "mutable value types are enough pure evil to turn you all into hermit crabs." The real problem is when we try to treat values as objects. This is a flaw in the "everything is an object" concept. When a struct implements an interface (especially if the interface implies mutability) you are treating a value like an object. Your example is analogous to the much simpler scenario of assigning the value of 'int i' to 'int j', incrementing the value of j, and expecting i to change as well. (I understand that an integer is "immutable," but I take the position that however j comes to be incremented, it is an implementation detail, whether the processor chooses to twiddle some bits [mutate] or wholly overwrite a value. The difference is philosophical; Schrodinger's cat is doing our math.) As long as structs are used only as "complex values," and are designed in this spirit, things tend to go okay. (I've written up these thoughts in a more elaborate and drown out manner at snarfblam.com/words) If you're implementing an interface with a struct, that's a clear sign you're doing something wrong.

  13. Shuggy says:

    @snarfblam

    It most certainly isn't. Implementing an interface that implies mutability is, but many values types perfectly reasonably implement things like: IEquatable<T>, IFormattable, IComparable<T> to name but a few.

    Interfaces do *not* imply boxing (take a look at the constrained opcode if you want to know why) but this is an implementation detail anyway since immutability means it doesn't matter if you take a copy (in boxing) anyway.

  14. Shuggy says:

    @snarfblam

    actually I just read your blog and you have bigger problems.

    If you cannot differentiate between a variable and the value contained within the variable I think you need to have a serious think about your CS skills.

    compare:

    public class X

    {

       public readonly Point P2;

       public Point P2

       public readonly int X1;

       public int X2

    }

    and have a read of Eric's previous post: blogs.msdn.com/…/mutating-readonly-structs.aspx

    There are extreme tricks you can do: http://www.bluebytesoftware.com/…/WhenIsAReadonlyFieldNotReadonly.aspx but this is still not altering the value, it's altering the variable (by aliasing to the same location)

    Fundamentally immutable value types are different from mutable ones because a whole class of bugs simply cannot happen, especially in the circumstances where you treat one as an object.

    You who blog post conflates treating values like object and the important distinction to remember which is very simple. Value types have copy value semantics, reference types have "copy the reference value semantics", Eric of course, explains it better: blogs.msdn.com/…/the-stack-is-an-implementation-detail.aspx

  15. Simon Buchan says:

    @Shuggy: I dunno, he seems to have a pretty ok grasp: he's talking about integer values being immutable doesn't mean integer variables are. Also, wouldn't you like a way to call unexposed interface members without boxing? By the way, I beleive he was referring to treating values as objects philospically, not actual boxing, re: opcodes.

    As a C++ dev, I also can't agree with the /idea/ of mutable value types as evil. I think them having *exactly the same syntax* is bad, but I can't think of a good alternative which keeps focus on reference types. But simply changing the syntax coloring of value types solves that issue. (I also change the interface and delegate colors in case someone is doing something silly in a library). Perhaps allowing explicit references to values would help? I don' tknow. The real issue, as always, is whether the programmer understands the language – something that language design can never really ensure (though it certainly has a huge effect!).

  16. Shuggy says:

    @Simon

    "Also, wouldn't you like a way to call unexposed interface members without boxing?"

    Sorry I don't see what you mean here, interface related methods are always exposed at the same accessibility as the interface.Are you refering to an explicit interface implementation on a struct, I assume such a thing is possible but it would be somewhat perverse, certainly I can't see a need for it?

    (note using he but no idea if that's valid)

    he certainly wasn't talking philosophically when he wrote:

    "When we need to treat a value like an object, it gets boxed. It gets put in the same kind of โ€œpackageโ€ as every normal reference-type object. In most cases, we canโ€™t use the boxed value until we un-box it by casting it back to a value type. This means that while itโ€™s parading as an object, it never actually has the chance to act like an object. Interfaces are the exception, because they box a value type, and they declare behavior for that boxed object."

    That's two assertion that are plain wrong.

    He spends a great deal of time saying "because you can change a variable that means things that are immutable actually aren't" This is patently ludicrous (try his 'thought experiment on a string).

  17. Simon Buchan says:

    @Shuggy:

    Yes, explicit interface implementation is possible – it's useful in exactly the same situations it's useful on reference types minus solving inheritance clashes.

    Oh whoops, I thought you were still referring to his reply, not the post.

    I might be missing something, but I don't see any invalid statements in that quote? I'd state it differently, but "treat[ing] a value like an object" is one way to say using a value without knowing it's full type, which of course does require boxing, either to Object, ValueType, or a declared interface.

    The english gets confused around "parading as an object, […] act like an object", but I think he's driving at something like values boxed as Object are useless untill you cast them to either the value type or an interface.

    re: his blog post – I think he's knows exactly what he means, he just is not exacting in his english. His eventual point is that since value type values don't have identity, mutation is indistinguishable from replacement – an int is merely 32 bits, changing the value is either mutating the bits or replacing the bits, there's no difference. Mutating Point.X is equivalent to replacing Point with the new X and the same Y (excluding perf. and atomicity).

    He then goes on to state that the issues that arise with mutable structs are to do with their misuse as "fast objects" – when they clearly don't behave anything like that.

    All in all, I wish for a BCL "class Box<T> where T: struct { public T Value; }" with the Nullable<T>-like C# sugar: something that would make value types *actually* work like objects so people who want to use them like that can.

  18. lidudu says:

    I kind of agree that this is more of the issue of treating value types like object. But I my opinion is that it is a compromise we should just accept.

    The main argument to add value types to C# instead of using class for everything, if I'm not wrong, is for performance. Then we should accept the issues introduced by that choice. Mutating value type is definitely useful for performance and it is common for C/C++ programs. If we are not able to harness it, we should go and find an easier job.

    Nullable<T> is a similar thing. To make it work perfectly as an object type, it should have been a class. But for performance reason, it is designed as a struct. So then we have to accept issues like:

    T varOfT = new T()

    may cause varOfT to have value null if T is a nullable type.

  19. Simon Buchan says:

    @lidudu: Performance is only sometimes a benifit if structs are immutable – in fact, the reason String is a reference type and not a value type is performance – but you hardly tell the difference between immutable reference types and immutable value types. There are certain types that are easier to implement and use as value types, however, especially those types we tend to think of as "values" rather than "things" – mathematical stuff like Pair, Point, Matrix, Color especially.

    A nitpick: it doesn't really make sense to talk about value types for C or C++, since they have no concept of the thing. To C, all variables hold values, just some values are references.

    Re: Nullable<T>: I'm not sure what you mean by work perfectly as an object type: to my understanding, that was not it's purpose. "new Nullable<T>()" being equal, and convertable to null, is odd, but in keeping with the purpose of Nullable<T> – to enable storing null in a value type. Certainly being able to store null should not be the requirement of being a reference type, given the effort people put into ensuring their reference types are not null!

  20. Shuggy says:

    @Simon

    Then I would assume calling an interface method on a value type where it is done by an explicit interface implementation without boxing might be pretty simple, if messy and repetitive

    For any method Frob on interface IFrobber

    public static void Frob<T>(this T t) where T : IFrobber

    {

       T.Frob()

    }

    I haven't verified this doesn't box yet (though I will do)

    As to his ideas on identity and mutability it again ignores the fundamental aspect which is the copy semantics, if you operate on a value type it is entirely possible you are operating on a copy, if you operate on a (non ref) copy of a reference you get no difference in behaviour at all, mutable or not. If you do so on a mutable value type you do. If it is immutable then it doesn't matter since you cannot perceive a difference. This is a fundamental difference so to say there is no difference between Point and int is wrong.

  21. Shuggy says:

    Oh and strings are reference types for many reasons but chief will be because they are variable size, so taking a copy very time you passed a multi megabyte string would be madness!

  22. SWeko says:

    So, actually the current C# compiler does not adhere to the C# spec ๐Ÿ™‚

    Anyway, It's relatively hard to run into this, as the usual usage is to declare the variable right there in the using statement, and I can't remember I've ever used it otherwise.

    The spec says that the used type should be "a type that can be implicitly converted to System.IDisposable", however since conversions to an interface are not allowed, itsn't this equivalent to "a type that implements System.IDisposable"?

    Also, if I implement IDisposable both implicitly and explicitly, and use the struct in a using, the Explicit implementation is always called, not the implicit, as I would infer from the article, and from the text of the spec (the expansion actually uses the explicit form)?

  23. Simon Buchan says:

    @Shuggy: Sure, but it copies. You need "void Frob<T>(ref T t) where T : IFrobber", and that's just ugly. And yes, like I said, strings are reference types because of performance, but what other reason do they have? They try pretty hard to behave like value types in every other way (immutable so the copy semantics don't matter, and using value equality).

    @SWeko: Good catch on the call by name/call by interface slot on Dispose – I just checked and it looks like the C# compiler cheats and calls the explicit implimentation on both forms, without boxing on the initialising form.

  24. Shuggy says:

    You're using structs, you shouldn't care if it copies since that's what they do (or if you are doing some serious low level optimisation knowing you can use ref and accept the ugly is fine)

    The whole reason why immutability is a very real concern that shouldn't be brushed aside as "well you can change the variable so it everything is mutable really" is that it makes precisely the scenarios we are discussing here *not matter*

  25. Shuggy says:

    okay, looks like it avoids boxing (disclaimer, I've not looked at the resulting assembly

    void Main()

    {

       var nasty = new FrobValue(0);

       Frob(nasty);

    }

    public static void Frob<T>(T t) where T : UserQuery.IFrobber

    {

       t.Frob();

    }

    public static void FrobRef<T>(ref T t) where T : UserQuery.IFrobber

    {

       t.Frob();

    }

    public interface IFrobber

    {

       void Frob();

    }

    public struct FrobValue : IFrobber

    {

      public FrobValue(int _) {}

      void IFrobber.Frob() { }

    }

    produces (in LinqPad)

    IL_0000:  ldloca.s    00

    IL_0002:  ldc.i4.0    

    IL_0003:  call        UserQuery+FrobValue..ctor

    IL_0008:  ldloc.0    

    IL_0009:  call        UserQuery.Frob

    Frob:

    IL_0000:  ldarga.s    00

    IL_0002:  constrained. 01 00 00 1B

    IL_0008:  callvirt    UserQuery+IFrobber.Frob

    IL_000D:  ret        

    FrobRef:

    IL_0000:  ldarg.0    

    IL_0001:  constrained. 01 00 00 1B

    IL_0007:  callvirt    UserQuery+IFrobber.Frob

    IL_000C:  ret        

    IFrobber.Frob:

    FrobValue.UserQuery.IFrobber.Frob:

    IL_0000:  ret

  26. Simon Buchan says:

    @Shuggy:

    Well the reason you would want to avoid boxing is so you can mutate through an interface – if copying was OK, ((IDisposable)t).Dispose() would be equivalent if potentially an iota more work for GC. And noone's "brushing aside" the problems with mutability, just that it is somehow OK for reference types, but horrible and should never be done for value types. If anything, I would say immutablity is more useful for reference types than value types given the increased chance for unexpected aliasing and far higher chance for cross-thread access, not to mention simplifying the ambiguity about modifying the reference versus the object when "identity" is not clearly defined for a type.

  27. Shuggy says:

    @Simon

    "just that it is somehow OK for reference types, but horrible and should never be done for value types."

    Okay you and I simply disagree at a very fundamental level.

    Anything which has value copy semantics is an immensely bad choice for mutability. The copy operations are  silent, and subtle (there are many occasions when copies are taken), any refactoring which pulled a value type between them could change the semantics, various aspects of construction would be hideous, readonly instance fields with them in get very messy.  

    Not to mention the concept of boxing changing the lifetime of the value, how on earth is that supposed to work…

    The List<T>.Enumerator gets away with it in normal scenarios because using foreach means you can't even see the enumerator to mutate it.

  28. Simon Buchan says:

    @Shuggy:

    But that thinking is exactly what @snarfblam was referring to as the real cause of the problems people have with value types – value types are value types, not objects, of course they behave differently. Treat them as values, and you're fine. I would argue IDisposable is (in general) not what you would want for a value type, not because it requires mutability, but because it implies *ownership* – which is the exact opposite of a plain value. Sure, you can, and interop style handles are a good candidate, just remember that there *will* be more than one value floating around.

    The copy operations are not *that* subtle, the majority of use cases are extremely obvious when they are copied – assignment operator? Copy. Passing an argument? Copy. I'm not sure what you're reffering to with construction – the rules about value type constructors require good design here – no ownership, no magic on copy, no type invariants. I do agree about readonly instance fields to a certain degree, but consider that anything you can do with a non-readonly immutable value type you can do with a readonly mutable value type and vice-versa. "readonly" is nonsensical for all value types, not just mutable value types. (And the readonly immutable combo is useful for both reference and value types).

    And both boxing and unboxing are each another copy, not that complex.

    To sum up: value types are not reference types and you should not think of them like that. They have different rules for design, different semantics when you use them, and different (not better or worse!) performance. If you treat them right, mutability is just as useful for them as for reference types.

    Whoops – I've gone into total wall of text mode – sorry about that ๐Ÿ™‚

  29. Ivan says:

    there's no point in using value types unless you want it to be copied and immutable, so your style is just bad. There's no performance improvement as well, specs don't say value types are allocated on stack, it just happened to be like that in current versions of .net.

  30. lidudu says:

    public static void Frob<T>(T t) where T : UserQuery.IFrobber

    {

      t.Frob();

    }

    This should not involve boxing by definition. At run time it is calling t of a concrete type rather than an interface, because generic functions will be instanced for each value type during JIT.

  31. lidudu says:

    @Simon:

    "Re: Nullable<T>: I'm not sure what you mean by work perfectly as an object type: to my understanding, that was not it's purpose. "new Nullable<T>()" being equal, and convertable to null, is odd, but in keeping with the purpose of Nullable<T> – to enable storing null in a value type. Certainly being able to store null should not be the requirement of being a reference type, given the effort people put into ensuring their reference types are not null!"

    I mean, if Nullable<T> was a class, it would follow reference type behavior (same as object) naturally. Rather than having various special behaviors different than either value type or reference type even though it does be a value type:

               int? i = new int?(); // i is null

               Console.WriteLine(i.ToString()); // work

               Console.WriteLine(i.GetType()); // NullReferenceException

               i = 10;

               object obj = i; // boxed int rather than boxed Nullable<int>

               var ic = (IComparable<int>)i; // i is boxed and then queried for interface rather than query directly on struct

    But anyway, I agree much of your points, especially that value type should not implement IDisposable. My core point is just that mutable value types are _not_ always evil.

  32. lidudu says:

    @SWeko: "Also, if I implement IDisposable both implicitly and explicitly, and use the struct in a using, the Explicit implementation is always called, not the implicit, as I would infer from the article, and from the text of the spec (the expansion actually uses the explicit form)?"

    How can you implement IDisposable both implicitly and explicitly? If it is in the same class, then implicit does not happen at all because it is already explicit. If implicit is done in base class, then the derived class's explicit reimplementation will override the interface function mapping. If explicit is done in base class, then the derived class's implicit reimplementation overrides.

  33. Simon Buchan says:

    @Ivan:

    What if I want assignment and passing arguments to make a copy – eg. to have value semantics? What does that have to do with immutability or performance? And value types' behaviour does imply certain performance characteristics, regardless of implementation – namely that assignment is linear in time to value size. I should note that stack isn't even cheaper than heap in CLR anyway – large and complex reference topologies slow the GC, not heap allocations, even rapid.

    @lidudu:

    Re: Nullable<T>: Yes, those behaviours you listed are all strange, and confusing, but none seem to stand out as reference type behaviour, other than the fact nullable is involved. Would you expect this to return 3?

    int Foo()

    {

       int? a = 1;

       int? b = a;

       a = 3;

       return b;

    }

    Re: @SWeko: Look at the final expansion in the original post: as written, it would call what would be the "implicit" method named "Dispose()" – however the actual compiler calls the true IDisposable.Dispose() method on the instance regardless of actual C# syntax rules.

  34. Gabe says:

    For those wondering why mutable value types (with public fields, no less!) are even allowed if they're so evil, see Rico Mariani's posts about when you actually need them: blogs.msdn.com/…/733887.aspx and blogs.msdn.com/…/745085.aspx

  35. Shuggy says:

    Third time trying to post this. If it was held in a queue sorry but I've not had good results from the msdn blog comment system in the past so retrying.

    /off topic

    No where did I say that value types should be treated as if they were the same as reference types. You seem to be putting words in my mouth. Noe the less, if you *do* then a certain class of potential bugs becomes impossible if the value type is immutable. It really doesn't get much simpler than that. Mutable implies more chance of bugs.

    As to readonly I can only assume you didn't read the previously linked post by Eric: stackoverflow.com/…/1144489 it is not about readonly applied to the fields within the struct but to reD only applied to a struct which is a member of another class. Most people I have asked were not awa that a silent copy occurred in those scenarios (by silent I mean no additional variable is visible in code)

    People screw this up, it's a real problem.

    Examples:

    stackoverflow.com/…/foreach-struct-weird-compile-error-in-c

    The issue was sufficiently important that the c# compiler team decided that they would detect obvious cases of pointless mutation of a silent copy and treat it as a *error* not even a warning. Even then people still don't get it…

    http://www.eggheadcafe.com/…/direct-access-to-struct-variables-in-liststruct-compiler-error-.aspx

    Mutability for a struct is absolutely positively not as useful for a struct as for a reference type, I cannot fathom how you could think that. There are certain very  specific interoperability scenarios where being able to write into chunks of memory with some level of type safety is useful, likewise if you can use them in some very hidden manner that involves no copying (like the foreach scenarios) then there is utility to the technique. I personally think that it would have been better if the compiler could detect the foreach cases and only use the mutable structs when that occurs. Calling GetEnumerator on a (compile time known) List<T> and then doing anything that could trigger a copy will produce the most marvellous bugs that surprisingly few people will understand (I have seen some very good and smart programmers bemused by this and it took some fun with Reflector the first time I experienced it)

  36. SWeko says:

    @Simon, @lidudu: Here's an example of what I was talking about:

           public struct MyHandle : IDisposable

           {

               public int Handle { get; private set; }

               public MyHandle(int handle) : this() { this.Handle = handle; }

               public void Dispose()

               {

                   Console.WriteLine("Disposing Implicit");

                   this.Handle = 0;

               }

               void IDisposable.Dispose()

               {

                   Console.WriteLine("Disposing Explicit");

                   this.Handle = 100;

               }

           }

           public static void Test()

           {

               var m1 = new MyHandle(1);

               m1.Dispose(); // <– "Disposing Implicit"

               var m2 = new MyHandle(2);

               ((IDisposable)m2).Dispose(); // <– "Disposing Explicit"

               using (var m3 = new MyHandle(3))

               {

                   //according to the article, should be Disposing Implicit

                   //actual value is Disposing Explicit

               }

           }

    So, we can implement both an implicit and an explicit version of the interface in the same class, and they will both be called in the appropriate context.

    However, the article stated that for this case, the compiler avoids boxing and calls directly, thus calling the implicit implementation, but in the example the explicit implementation is called.

  37. Simon Buchan says:

    @SWeko: That's one way to say it, but the runtime and CLR don't see it that way – you merely have a type that explicitly implements Dispose(), and also has an unrelated public method called Dispose(). It's sort of like "public new void Method()" without the inheritance – reusing the name without any polymophism or virtual slots being involved.

  38. SWeko says:

    @Simon:  Yes, there's always a misunderstanding possible when treating CIL code as it was C#, and an explicit implementation always trumps the implicit one when used via the interface. However, the compiler is quite content if I do not provide an explicit implementation, more, I'll speculate that there are lots of c# users that do not know about explicit interface implementations.

    The implicit call ( m1.Dispose(); ) is transfered to

     call       instance void TestAppConsole.Structs/MyHandle::Dispose()

    The explicit call (((IDisposable)m2).Dispose(); ) is tranfered to

     box        TestAppConsole.Structs/MyHandle

     callvirt   instance void [mscorlib]System.IDisposable::Dispose()

    So there is actual boxing included if I invoke the explicit implementation – even if no explicit implementation is provided in the struct code.

    The using statement's dispose is indeed compiled to

     constrained. TestAppConsole.Structs/MyHandle

     callvirt   instance void [mscorlib]System.IDisposable::Dispose()

    that calls a virtual method on a value type, so there's no boxing/ copying involved.

    In C# terms, this is neither m.Dispose() – it's a callvirt, not a call, neither a ((IDisposable)m).Dispose() – there's not boxing included, but a completely third construct, so yes, my C# restricted musings are a bit moot.

  39. As a side note, there is actually Box<T> in the Framework:

    msdn.microsoft.com/…/bb549038.aspx

    No syntactic sugar, though.

  40. Simon Buchan says:

    @SWeko: Yeah, the complaint was merely about calling it implicit implementation when it's not actually implementing. ๐Ÿ™‚ Thanks for the IL – now I need to go lookup what "constrained." does again.

    @Pavel: I was kinda hoping that "(StrongBox<int>)(object)obj;" would work. I'm trying to think what other compiler support would be usefull, none of the operators seem to make much sense (can't pick between assigning to variable or value, mostly). Thanks for the pointer, though!

  41. snarfblam says:

    @Pavel Minaev, from the page you linked to, "This API supports the .NET Framework infrastructure and is not intended to be used directly from your code."

  42. Simon Buchan says:

    @snarfblam: Everything in the "System.Runtime.CompilerServices" namespace gets that – it's a little overblown, I think it's just trying to say "This stuff is designed to be easy for compilers, not humans. Please don't complain about usability."

  43. lidudu says:

    @Simon: I don't see a problem in

    int Foo()

    {

      int? a = 1;

      int? b = a;

      a = 3;

      return (int)b; // you missed a cast here

    }

    If nullable was reference type, the code still produce 1, because a = 3 translates to a = (int?)3 but never a.Value = 3.

  44. lidudu says:

    I feel that the issue is more of improper use of value type for types which need to manage ownership, etc. I don't think that's what value type is designed for, as per what .NET library design guidelines say. In those cases, reference type should be used instead of value type. If, for any reason, the coder really want it to be a value type, then design the type as immutable.

    @SWeko: it is interesting that it generates

    constrained. TestAppConsole.Structs/MyHandle

    But I kind of think it is consistent with generic case like:

    void Func<T>(T s) where T: struct, IDisposable

    {

     s.Dispose();

    }

    For T as your MyHandle struct, this function also calls IDisposable.Dispose() without boxing. Because Dispose() is called on the struct variable directly, rather than converting to IDisposable interface type first.

    If you do ((IDisposable)s).Dispose(), then you are explicitly creating a temporary variable of type IDisposable, which is a reference type, then it will surely produce IL code for boxing.

  45. Simon Buchan says:

    @lidudu: Good catch on the cast (though I preffer .Value) – but you are describing value semantics. Unless you want "a = 3" to mean value semantics and "a.Value = 3" to be either an null error or reference semantics, which is sane (and also System.Runtime.CompilerServices.StrongBox<T>). But I beleive the value-type behavoiur of Nullable<T> is entirely by design.

  46. lidudu says:

    @Simon: Nullable<T>.Value is a read-only property and the type is immutable. Similar to string a = 'a' does not mean to alter a character of the object but to assign the variable a with a new object reference.

    If it was reference type and it translated to a.Value = 3, then int? a = null; a = 3; will throw NullReferenceException.

  47. SteffenM says:

    1. In the language specs I read "In either expansion, the resource variable is read-only in the embedded statement, and the d variable is inaccessible in, and invisible to, the embedded statement.". Whereby the "d variable" only exists in the expansion for dynamic types. There exists an explicit expansion for value types, therefore the dynamic case is surely not appropriate here. Thus I would not have expected, that a shadow variable is created.

    2. The expansion for value types states: "((IDisposable)resource).Dispose();". Also we can read: "An implementation is permitted to implement a given using-statement differently, e.g. for performance reasons, as long as the behavior is consistent with the above expansion.". But I think, optimizing the cast away, offends the requirement of a consistent behavior, since casting introduces a boxing, which leads to a different behavior.

  48. >> But I think, optimizing the cast away, offends the requirement of a consistent behavior, since casting introduces a boxing, which leads to a different behavior.

    It's not different if you can't observe it (the "as if" rule). Speaking of which, can anyone come up with some way of observing boxing (or lack thereof) during the call to Dispose in "using" (while remaining within the realm of language spec, obviously).

  49. Gabe says:

    Pavel: The usual way to observe boxing is to run a function under a memory profiler. If you see allocations for a value type, it's a box (or an array). That's how you can tell that comparing an unconstrained generic value to null is boxing when the generic type is Nullable<>, for example.

  50. @Gabe: that is precisely while I said "remaining within the realm of language spec". The language spec does not define the effect a program will have on a profiler – thus, whether you see boxing there or not, it does not affect the conformity of a particular C# implementation. To prove that this is non-conforming behavior, you'd have to devise some scheme to observe the boxing behavior from within the program itself.

  51. Simon Buchan says:

    @Pavel: An edge case, and I'm sure the spec does not ensure fixed (this) is stable for local variables (that are not in an iterator or async method and are not captured by a lambda and therefore on the stack), so it may not be "within the language spec" to the letter, but I think this pattern is somewhat useful, for eeking out some more perf:

    unsafe struct PtrLock : IDisposable

    {

       static HashSet<PtrLock*> locked = new HashSet<PtrLock*>();

       public static DoSomethingToLocked() { … }

       bool isLocked;

       public void Lock() { if (!isLocked) { isLocked = true; fixed (PtrLock* ptr = this) locked.Add(ptr); } }

       void IDisposable.Dispose() { if (isLocked) { isLocked = false; fixed (PtrLock* ptr = this) locked.Remove(ptr); } }

    }

  52. >> I'm sure the spec does not ensure fixed (this) is stable for local variables

    Actually it does (assuming that you meant to write "fixed(&this)"). Quote:

    "Fixed variables reside in storage locations that are unaffected by operation of the garbage collector. (Examples of fixed variables include local variables, value parameters, and variables created by dereferencing pointers.) On the other hand, moveable variables reside in storage locations that are subject to relocation or disposal by the garbage collector. (Examples of moveable variables include fields in objects and elements of arrays.)."

    so your code would indeed show the effect of boxing or lack of boxing.

    The reason why I thought it still wouldn't work is because the using-variable is readonly according to the spec. I always assumed that this carries with it the same set of restrictions as you have for a readonly field. Sure enough, you can't assign to a using-variable, you can't take its address with unary &, and you can't pass it as ref/out.

    But there is one big difference. For a readonly field, when a method is called on it, a temporary copy is created for that call, so that any mutating method would mutate that copy and not touch the original field. If that were the case for using-variable also, then &this inside the method would give you the address of the temporary, which would be expected to be different from what you get as &this inside Dispose(). But that isn't what happens – when a method is called on a using-variable, despite it being "readonly", the method is called directly and can mutate it! This also means that &this is guaranteed to be the address of the actual variable, and not the copy.

    On the other hand, one could argue that lifetime of using-variable is already over by the time Dispose() is called (it's over immediately after the boxing cast to IDisposable that the spec mandates), and so its memory location could be "reused" for boxing by the implementation. In other words, getting the same address inside Dispose() as in other methods is not by itself enough to say that implementation did not box, as the spec does not require the box to have a distinct address (it would require that if the box coexisted with the variable in an observable way, but it doesn't).

  53. @Bill P. Godfrey "mutable value types could be an argument for adding support for C++ style copy constructors and out-of-scope destructors for C# value types."

    AAAIEEEE! Run for the hills! ๐Ÿ™‚

    It's the other way round, surely. The problems caused by mutable value types form a very strong argument for not creating mutable value types, and would not be solved by those features anyway.

    The question is, why have structs at all? Obviously we're stuck with them now, but if they didn't exist would they need to be invented? The answer is no.

    It's common for C++ users to assume that GC-ed (heap allocated) reference types will perform much more poorly than stack-allocated value types, but this is based on their experience with C++ native heaps, which are non-compacting and so perform very poorly. This may be the reason why custom value types were added to the CLR in the first place.

    But the reality in the present-day CLR is quite different – the CLR GC heap is so blazingly fast, performance is hardly ever a reason to choose between 'struct' and 'class'. See: msdn.microsoft.com/…/dd942829.aspx

    So you don't get a performance boost from struct over class, and yet you have to give up so much in terms of language features. It's a very poor trade.

    And are copy constructors really something to pine over? If something is immutable, there is never a reason to make an exact copy of it. The original will do just fine – it's never going to change.

    In CLR/Java, if you find yourself thinking "Dang, I wish I could write a copy constructor for this class", try making it immutable instead, and watch your troubles evaporate! You will be able to treat references to it as if they were values (as long as you override ==/!= appropriately).

    Strings are the really telling example here. By making them immutable GC-ed objects, runtimes like the CLR and the JVM actually provide measurably better performance than Standard C++ programs using appropriately-designed classes with copy constructors, i.e. the std:: stuff. See the famous showdown: blogs.msdn.com/…/416151.aspx

  54. Gabe says:

    Daniel: I find it ironic that you used a Rico Mariani blog post to support your point that structs are unnecessary, while I used one (blogs.msdn.com/…/733887.aspx) to support my point that they *are* necessary.

    In fact, if they didn't exist, somebody would have probably had to invent them. That's why NumPy exists — Python doesn't have low-overhead types like struct so somebody had to create them in a C module.

    If you look at Rico's MeshSection example, you'll see that Point3d is 24 bytes, Point2d is 16 bytes, Vector3d is 24 bytes, Vertex is 64 bytes, and Quad is 16 bytes. Allocating a MeshSection allocates exactly 3 objects (the MeshSection, the Vertex array, the Quad array, and ignoring the TextureMap). Creating a MeshSection with 1k vertices and quads will use 64k bytes for the Vertex[], 16k bytes for the Quad[], and maybe 64 bytes of overhead, for a total of 80k. Accessing any element of either array (even vertices[123].normal.dx) requires finding the start of the array, multiplying the size of each element by the index, and adding in the offset to the field. GC overhead is negligible because there are only 3 objects.

    If this had to be done with only reference types (on a 32-bit machine where each object has 8 bytes overhead), Point3d is 32 bytes, Point2d is 24 bytes, Vector3d is 32 bytes, Vertex is 20 bytes, and Quad is 24 bytes. Allocating a MeshSection with N vertices and quads allocates 3 + 5N allocations (5k in this case). The memory used is 64 +4N (array of references to Vertex) + 4N (array of references to Quad) + 20N (instances of Vertex) + 32N (instances of Point3d) + 24N (instances of Point2d) + 32N (instances of Vector3d) + 24N (instances of Quad), for a total of 140k. Now to access vertices[123].normal.dx you have to find the offset into the Vertex[], dereference the Vertex, dereference the Vector3d, and find the offset to the field.

    So for this example, using only reference types nearly doubles the amount of memory used, makes accessing fields several times more work, and turns the memory management (allocation and GC) overhead from being constant to being linear. I would argue that certainly the combination of all of these is reason enough to implement value types in .Net.

  55. Shuggy says:

    Implementing certain high performance, latency sensitive applications in Java is a nightmare, allocations might be cheap but collections can be expensive, the less you allocate the less frequently you have to collect. Gen0 isn't too bad but you get a steady drain into gen1 as a result and if that gets fast enough you can trigger gen2 ever few seconds. Lest you think this is a premature optimisation and that things that are stack bound in terms of lifespan shouldn't be too bad a missed logging statement concatenating a string and an int on the hot path is enough to cause a runaway inability to keep up if it's on the hot path.

    Using structs here becomes a necessity, even the escape analysis in the current java runtimes isn't enough in many cases, it's just too conservative.

    Certainly many people need not care about these things, but there are people using .net because they get the safety of managed code with the freedom to take responsibility for value types, pointers (very useful for fast serialisation), union types, stackalloc, et al when they really need to. That there are people who use those when they don't need to doesn't bother me one whit so long as they don't go near my code base.

    This all goes double for the compact framework or XNA on the xbox where the generational GC disappears and collections become something to either avoid, or force your design to allocate slugs of easily traversable low reference plain old data objects so that collection is cheap.

  56. Simon Buchan says:

    I would write a reply to @Daniel, but @Gabe and @Shuggy covered everything. Well, I should add that the object header also add some number of bytes close to the size of those structs (8 or 16 bytes, on 32bit I think?) which makes the memory usage much worse, not to mention if the allocation of all those objects is not all together (unlikely in the context of an model editor, say) that any processing will be all over the place in memory access, which will be the reall killer as your 30000 vertices need 3000 page acesses rather than 60 (and each extra page pushes another page out, remember!).

  57. I think those are all great examples of how micro-optimisation can make a huge difference in specific cases, so they're perfectly valid examples, but… they could have been solved in a library written in C or C++. Exactly as they are in Python! (I find it ironic that you'd invoke a successful *library* solution as indicating the need for a *language* feature).

    An alternative-history C# that never had structs would not have been significantly worse as an application for developing real-world applications, and would still have been great at *consuming* high-performance native libraries.

  58. Shuggy says:

    "and would still have been great at *consuming* high-performance native libraries."

    No, it wouldn't. Structs with explicit memory layout are pretty much a requirement for several of the libraries I use.

    Also the moment I introduce a C++ boundary I am no longer able to use all those nice things like closures and the like across that boundary. Also if the value type usage is pervasive within the unmanaged API then it may force an overly chatty API (or force far more of the design down into the unmanaged code losing me all those nice features I want to have access to).

    Wanting to create to create a composite container of plain old data without a heap allocation is a reasonable desire for anyone writing high performance code in an environment where you cannot simply change the memory management routines out underneath it. Sometimes human's really can beat the machines.

  59. Gabe says:

    Consuming high-performance native libraries is great work, if you can get it. Of course, one of the reasons to use managed code is that it runs in sandboxed environments like the web browser, mobile phones, XBox, shared hosting (SQL Server, ASP.NET), and Internet ClickOnce apps. Since native libraries are of no use to authors of apps in those environments, it's a good thing that C# can be used to write high-performance libraries for them.

    The more you can do within a language, the more benefits you get from composition. That's why LINQ is so great: instead of falling off a cliff into SQL-land any time you want to write a query, or being stuck writing loops manually, you can just write all of your queries in C# and let the environment handle the rest. C# makes it easy to compose queries with graphics libraries and math libraries. For example, let's say I want to know what gamma value produces the most evenly-distributed histogram for each of a set of images. I can easily write a LINQ query to take a table of gamma values, pass them to a graphics library to do gamma correction, pass the resulting image to a math package to make a histogram, and then write my own function to analyze the histogram.

    In the alternative-history C# where queries can only be run against DB libraries, graphics can only be done with graphics libraries, and math can only be done with math libraries, composition is impossible and you're stuck writing it all yourself.