Too much type information, or welcome back System.Object and boxing


We all know that generics are good – they promote code reuse, static type checking by the compiler, increase runtime performance, allow more flexible OOP designs, lay the foundation for LINQ, help the IDE to provide more helpful IntelliSense and have tons and tons of other vital advantages. “var” is another good feature, which (unlike “object”), also helps to preserve full static type information.


However I hit a rare case recently where I had too much static type information about my code, so I had to use System.Object (and boxing) to get the desired effect. I had a method that used reflection to set a property on a type, similar to this:

static void SetProperty(object f)
{
Type type = f.GetType();
PropertyInfo property = type.GetProperty(“Bar”);
property.SetValue(f, 1, new object[0]);
}

I also had a struct like this:

struct Foo
{
public int Bar { get; set; }
}

Now, I tried to set the Bar property on an instance of the struct:

static void Main(string[] args)
{
var f = new Foo();
SetProperty(f);
Foo foo = (Foo)f;
Console.WriteLine(foo.Bar);
}

It didn’t work! It printed out 0! I was puzzled. And then I realized what is happening. Since Foo is a struct, and f (thanks to var!) is also statically known to be a struct, the compiler passes a copy of the struct by value to the SetProperty method. This copy is modified, but the original f is not.


One simple change and it started working fine:

static void Main(string[] args)
{
object f = new Foo();
SetProperty(f);
Foo foo = (Foo)f;
Console.WriteLine(foo.Bar);
}

I changed var to object, the struct was boxed into an object on the heap, the reference to this same object was passed to the SetProperty method, method set the property on the boxed instance, and (Foo) unboxed the same modified instance – the code now prints out 1 and everything is OK again.


“var” provided too much type information to the compiler – it avoided boxing, and knew that the variable is a struct, so I lost the modified value. After casting to object, we hid the extra information from the compiler and got the uniform behavior for both value types and reference types.


In my original code where I encountered this peculiar behavior (a custom deserializer that reads XML and uses reflection to set properties on objects), I was too focused on working with all types so I forgot that those can be value types as well. Since I had everything strongly typed with generics, type inference, vars and other modern goodness, the kind hardworking compiler preserved all the information for me and avoided boxing where I was expecting to get reference type behavior. Thankfully, unit-tests revealed the error 10 minutes after it was introduced (I definitely need to post about the usefulness of unit-tests and TDD in the future), so it was a quick fix to box a type into object before filling its properties.


It was an amusing experience.

Comments (22)

  1. RednaxelaFX says:

    Hi,

    I saw this post on VS2008’s start page…

    Thought this might serve as another good example why value types should be made immutable. The compiler did nothing wrong, it just faithfully infered the type out. The code wouldn’t have work if the ‘var’ is replaced with its actual type anyway, and we all know that, right?

    Maybe a better solution would be getting rid of mutable value types in the design, if possible. That’s much less error-prone.

  2. Yes, mutable structs are evil. However the requirements for our object deserializer are such that it has to be universal and be able to deserialize both classes and structs.

    To deserialize an immutable struct, we would need to know about a constructor and actually create the value, not just set properties.

  3. gps says:

    You wrote "We all know that generics are good…".  I started with basic, worked my way to C, C++, and Java, then VB (I know, really obleque turn), before I got to C#.  While I agree in part that Generics are good, I don’t believe var is a good thing at all.  It’s an evil little thing that says I don’t know what I’m working with, so I’ll blindly go forward.  The C family of languages is strongly typed to aviod such dangerous ideas.  Since everything in C# is derived from the Object class, even value types, there should be no reason to use var outside of being assigned an anonymous type, which can only be used safely in the function it is declared.

  4. Matthew says:

    Why were you using a struct in the first place?

  5. gps: I don’t quite agree about the var part. Var preserves the static type of an expression without explicitly repeating it, thus avoiding redundance. Most functional languages use it in this or another form ("type inference"). There are trade-offs, yes (mostly around readability), but I don’t see var being technically harmful. I use it in about 50% of cases and use personal judgement every time. I recommend reading Jon Skeet’s "C# in Depth" if you want to learn more about this.

    Matthew: We have structs in code which we don’t want to rewrite now. However it was being serialized manually and I was converting it to automatic serialization. That’s why I needed the serializer to work with existing code.

  6. Kevin says:

    var/type inference has its place in c#, e.g. linq. but that doesn’t mean we should abuse it.

  7. Jose Luis Chavez del Cid says:

    If you know it’s a struct, your designed it as struct you know you have to pass the parameters as ByRef (VB) / ref (C#). Theres no mistake there. Three letters makes the difference and you won’t have any problem, even if you pass an object.

  8. And that’s exactly why I hate mutable structs so much.  See my post on enforcing immutability at http://blogs.msdn.com/kevinpilchbisson/archive/2007/11/20/enforcing-immutability-in-code.aspx

  9. David Nelson says:

    @Luis,

    Passing the argument to SetProperty by reference is not appropriate, because it implies that after the method call the reference passed in could be pointing to a completely difference instance! Which is clearly not what the method is trying to accomplish.

  10. Robert Foster says:

    @ Luis,

    The whole point is that you don’t know in advance that you have a struct, or an object… Kirill has stated that this example has been pulled from an automatic serializer that he is working on… in that case, you need to be able to to pass it various objects without actually knowing what they are.

  11. Giraffe says:

    Another option would be to force structs to be passed by reference by using generics:

    static void SetProperty<T>(T f) where T : class

    {

    Type type = f.GetType();

    PropertyInfo property = type.GetProperty("Bar");

    property.SetValue(f, 1, new object[0]);

    }

    static void SetProperty<T>(ref T f) where T : struct

    {

    object fObj = (object)f;

    SetProperty(fObj);

    f = (T)fObj;

    }

    This will throw a compile error if you call SetProperty with a value type without specifying it as a ref parameter.   Obviously, this won’t work if the calls should be the same for object and value types:

    static void Main(string[] args)

    {

    var f1 = new Foo();

    SetProperty(ref f1);

    Foo foo1 = (Foo)f1;

    Console.WriteLine(foo1.Bar);

    object f2 = new Foo();

    SetProperty(f2);

    Foo foo2 = (Foo)f2;

    Console.WriteLine(foo2.Bar);

    }

  12. Harper Shelby says:

    @Kirill

    Certainly var has its place as a ‘type’ name for variables holding the return values of LINQ queries that will be anonymous types generated by the compiler, but in situations where the programmer knows the type in advance, it seems like it invites issues like this. I would personally recommend coding standards that explicitly forbade its use in those situations for that reason.

  13. Well, as I said, I’ve never hit this case before, so I thought var can do no harm. I used to think carefully everytime I needed to declare a local variable, and I now I will think even more carefully. But I still love var and I expect myself using it in the future as well where appropriate (I’ll just have to be more careful). Not necessarily for anonymous types (which I rather almost never use), but also where it increases readability and the type is clear from the variable name/ambience.

  14. Bruce Pierson says:

    "To deserialize an immutable struct, we would need to know about a constructor and actually create the value, not just set properties."

    I don’t understand this. If you are using reflection, can you not still deserialize a struct by setting the fields rather than the properties? After all, the framework somehow knows how to deserialize "immutable" value types… Since inheritance is not an issue, you know all the fields in a value type will be DeclaredOnly, and there will always be a parameterless default constructor suitable for Activator.CreateInstance().

    At the very least, if you want to use properties and make it immutable by normal means, just use a "private set" and then let reflection find that.

  15. Hi Bruce,

    all your comments are very valid. From a couple of hints I see that you clearly know what you’re talking about. However we have a couple of requirements:

    1. We want to keep our serializer/deserializer very simple, maintainable (500 lines of code for deserializer and 150 lines of code for serializer) and keep full control over it

    2. We only serialize public writable instance properties, we don’t even look at fields

    3. The list of participating properties is returned by a piece of common reflection logic that we want to keep really simple/trivial

    4. This is not shipping code, so we just want to get it working and move on – my solution turned out to be the best in terms of cost/benefit – quick, maintainable and does the job.

  16. Bruce Pierson says:

    Sure, that makes sense now. I just wanted to make sure I wasn’t missing something in my own mapper / deserializer code. Thanks for the reinforcement.

    I’ve been bitten by the value-type bug before, so that’s why I viewed this in the first place. Good info, thanks.

  17. Stas Neverov says:

    Var is good! It reduces the "noise" in your code (especially when you use generics a lot) thus making it more readable. For me readability is much more important then this potential stupid simple issues when you mix up var with object.

  18. Justin Chase says:

    You probably should pass the struct by ref rather than cast into an object. You are probably going to see some performance gains if you do it that way, not to mention the fact that this is uuuuuugly.

    static void SetProperty(ref object obj)

    {

      //…

    }

    var f = new Foo();

    SetProperty(ref f);

    @David Nelson

    I believe you are confusing ‘ref’ with ‘out’.

  19. David Nelson says:

    @Justin,

    No, I am not confusing them. ‘out’ is merely a specialized form of ‘ref’, in which the argument does not have to be initialized before being passed in (as opposed to ‘ref’ in which the argument does have to be initialized before being passed in). In fact the CLR does not even support ‘out’ parameters; the C# compiler simply implements them as ref parameters in IL (try calling a method with an out parameter that was created in C# from a VB.NET application).

    In both cases, the called method can set the reference to point to a different instance than what was passed in. The only difference is that an ‘out’ parameter is required to be set by the called method, since it may not have been initialized before the method was called; whereas setting a ‘ref’ parameter is optional, because it was required to already have a value before it was passed.

  20. Gary W. says:

    I looked up "const ref" because that seems like what what is being desired by some in the above comments, and it appears that this doesn’t exist in C# (like in C++).  I had to look this up to get a better understanding.

    I found this link helpful:

    http://channel9.msdn.com/forums/Coffeehouse/255508-const-ref-in-C/

    (I need to read more about "var" as well….)

  21. James Hart says:

    I don’t think you’re being fair to ‘var’ here, since the problems are still there with an explicitly typed variable. If anything’s caused this problem, it’s in the nature of C#’s handling of automatically boxing value types when they’re passed to a method that accepts System.Object (or an interface, for that matter).

    When a method says it accepts a System.Object as an argument, it seems natural for the method to proceed on the assumption that the calling code also has a reference to the object passed. But if the argument passed in is a value type, C# boxes it up into a reference, passes the reference in, but /discards the reference/ from the perspective of the calling method. That’s the bit that catches the called method out – all its work on the object its given is going to be thrown away.

    It’s as if

    Foo f = new Foo();

    SetProperty(f);

    Console.WriteLine(f.Bar);

    is actually treated as

    Foo f = new Foo();

    object temp = f;

    SetProperty(temp);

    Console.WriteLine(f.Bar);

    It seems it would be more intuitive if what C# actually did was this:

    Foo f = new Foo();

    object temp = f;

    SetProperty(temp);

    f = (Foo) temp;

    Console.WriteLine(f.Bar);

    But I guess that could get messy if SetProperty were to hand a reference to temp off to another thread…

    As it is, you can of course do that yourself explicitly, which is pretty much what your code ends up doing by declaring f to be of type object in the first place.

  22. Alex Fedin says:

    I would like to ask another question: why did you even choose to have your own serializer, if there are a number of them already? (Xml Serializer, Xaml Serializer, Soap Formatter, WCF Formatters).

    Was the major intention "to reinvent the wheel"?