Representation and Identity

(Note: not to be confused with Inheritance and Representation.)

I get a fair number of questions about the C# cast operator. The most frequent question I get is:

short sss = 123;
object ooo = sss;            // Box the short.
int iii = (int) sss;         // Perfectly legal.
int jjj = (int) (short) ooo; // Perfectly legal
int kkk = (int) ooo;         // Invalid cast exception?! Why?

Why? Because a boxed T can only be unboxed to T. (*) Once it is unboxed, it’s just a value that can be cast as usual, so the double cast works just fine.

Many people find this restriction grating; they expect to be able to cast a boxed thing to anything that the unboxed thing could have been cast to. There are ways to do that, as we’ll see, but there are good reasons why the cast operator does what it does.

To understand why this design works this way it’s necessary to first wrap your head around the contradiction that is the cast operator. There are two (¤) basic usages of the cast operator in C#:

  • My code has an expression of type B, but I happen to have more information than the compiler does. I claim to know for certain that at runtime, this object of type B will actually always be of derived type D. I will inform the compiler of this claim by inserting a cast to D on the expression. Since the compiler probably cannot verify my claim, the compiler might ensure its veracity by inserting a run-time check at the point where I make the claim. If my claim turns out to be inaccurate, the CLR will throw an exception.
  • I have an expression of some type T which I know for certain is not of type U. However, I have a well-known way of associating some or all values of T with an “equivalent” value of U. I will instruct the compiler to generate code that implements this operation by inserting a cast to U. (And if at runtime there turns out to be no equivalent value of U for the particular T I’ve got, again we throw an exception.)

The attentive reader will have noticed that these are opposites. A neat trick, to have an operator which means two contradictory things, don’t you think?

This dichotomy motivates yet another classification scheme for conversions (†).  We can divide conversions into representation-preserving conversions (B to D) and representation-changing conversions (T to U). (‡) We can think of representation-preserving conversions on reference types as those conversions which preserve the identity of the object. When you cast a B to a D, you’re not doing anything to the existing object; you’re merely verifying that it is actually the type you say it is, and moving on. The identity of the object and the bits which represent the reference stay the same. But when you cast an int to a double, the resulting bits are very different.

All the built-in reference conversions are identity-preserving (£). Obviously trivial “conversions” such as converting from int to int are also representation-preserving conversions. All user-defined conversions (§) and non-trivial value type conversions (such as converting from int to double) are representation-changing conversions. Boxing and unboxing conversions are all representation-changing conversions.

The representation-preserving conversions that are known to never fail often result in no codegen at all (₪). If a representation-preserving conversion could fail then a castclass instruction is emitted, which does a runtime check and throws if the check fails.

But each representation-changing conversion is handled in its own special way. User-defined conversions are resolved using a special version of the overload resolution algorithm, and generated as a call to the appropriate static method. Boxing and unboxing conversions are generated as box and unbox instructions. All the other built-in conversions (int to double, and so on) are generated as custom sequences of instructions that do the right conversion.

So now that you know that, consider what the compiler would have to do to make this work the way some people expect:

int kkk = (int) ooo;

All that the compiler knows is that ooo is of type object. It could be anything. Suppose it is a boxed int – then the compiler should generate an unboxing instruction. Suppose it is a boxed short. Then the compiler should unbox the short and then generate the custom sequence of instructions that convert a short to an int. Suppose it is a boxed double – same thing, but different instructions. And so on, for all the built-in conversions that go to integer.

This would be a huge amount of code to generate, and it would be very slow. The code is of course so large that you would want to put it in its own method and just generate a call to it. Rather than do that by default, and always generate code that is slow, large and fragile, instead we’ve decided that unboxing can only unbox to the exact type. If you want to call the slow method that does all that goo, it’s available – you can always call Convert.ToInt32, which does all that analysis at runtime for you. We give you the choice between “fast and precise” or “slow and lax”, and the sensible default is the former. If you want the latter then call the method.

That’s just the built-in conversions. Let’s continue imagining what would have to happen if we wanted all possible conversions to int to just work out correctly at runtime, instead of just bailing out early if the boxed thing is not an int.

Suppose the object is a Foo where there is a user-defined conversion from Foo (or one of its base classes) to int (or a type that int is explicitly convertible from, like, say, Nullable<int>). Then the compiler would need to generate a call to that conversion method, just as it would if the type had been known at compile time, and then possibly also generate the conversion from the return type of the method to int.

Remember, there could be arbitrarily many such conversion methods on arbitrarily many types. The type Foo and its conversion method might not even be defined in the assembly currently being compiled or any assembly referenced. Therefore the compiler would have to generate code to interrogate Foo at runtime, do the overload resolution analysis, and then dynamically spit the code to do the call.

Which is exactly what the compiler does in C# 4.0 if the argument to the cast is of type “dynamic” instead of object. The compiler actually generates code which starts a mini version of the compiler up again at runtime, does all that analysis, and spits fresh code. This is not fast, but it is accurate, if that’s what you really need. (And the spit code is then cached so that the next time this call site is hit, it is much faster.)

I don’t think people really expect the compiler to start up again at runtime every time they cast an object to int; I think they just haven’t thought through carefully exactly how much analysis solving the problem would take. Rather a lot, it turns out.


(*) Or Nullable<T>.

(¤) There are others that are not germane to this discussion. For example, a third usage is “Everyone knows that this D is also of base type B; I want the compiler to treat this expression of type D as a B for overload resolution purposes.” That would clearly be an identity-preserving conversion.

(†) There are many ways to classify conversions; we already divide conversions into implicit/explicit, built-in/user-defined, and so on. For the purposes of this discussion we’ll gloss over the details of those other classifications.

(‡) I’m glossing over here that certain conversions that the C# compiler thinks of as representation-changing are actually seen by the CLR verifier as representation-preserving. For example, the conversion from int to uint is seen by the CLR as representation-preserving because the 32 bits of a signed integer can be reinterpreted as an unsigned integer without changing the bits. These cases can be subtle and complex, and often have an impact on covariance-related issues; see next footnote.

I’m also ignoring conversions involving generic type parameters which are not known at compile time to be reference or value types. There are special rules for classifying those which would be major digressions to get into.

(£) This is why covariant and contravariant conversions of interface and delegate types require that all varying type arguments be of reference types. To ensure that a variant reference conversion is always identity-preserving, all of the conversions involving type arguments must also be identity-preserving. The easiest way to ensure that all the non-trivial conversions on type arguments are identity-preserving is to restrict them to be reference conversions.

(§) The rules of C# prohibit all user-defined conversions that could possibly be identity-preserving coercions. More generally, all user-defined conversions that could possibly be any “standard” conversion are illegal.

(₪) Again, I’m ignoring irksome generic issues here. There are situations where humans can prove mathematically that two generic type parameters must be identical at runtime, but the verifier is not smart enough to make those same deductions and requires the compiler to emit type checks.

Comments (26)

  1. Goran says:

    Just what I was wondering about last week. As always – clear, precise and informative post. Thank you!

  2. holatom says:

    Very nice article. I have been waiting for someone to "deal" with this issue a very long time.

    Additionaly the only difficulties with this is in generics classes – best way to address this problem is technique like this:

       public static class DynamicConverter<TFrom, TTo>


           private static Func<TFrom, TTo> converter = CreateExpression<TFrom, TTo>(body => Expression.Convert(body, typeof(TTo)));

           public static TTo Convert(TFrom valueToConvert)


               return converter(valueToConvert);


           public static Func<TFrom, TTo> Converter


               get { return converter; }


           private static Func<TArg1, TResult> CreateExpression<TArg1, TResult>(

               Func<Expression, UnaryExpression> body)


               ParameterExpression inp = Expression.Parameter(typeof(TArg1), "inp");



                   return Expression.Lambda<Func<TArg1, TResult>>(body(inp), inp).Compile();


               catch (Exception ex)


                   string msg = ex.Message; // avoid capture of ex itself

                   return delegate { throw new InvalidOperationException(msg); };




       public static class DynamicConverter


           public static TTo Convert<TFrom, TTo>(TFrom valueToConvert)


               return DynamicConverter<TFrom, TTo>.Convert(valueToConvert);



    than you can do something like:

    var converted = DynamicConverter.Convert<sometype, T>(source);

    in you generic class if you know than this conversation from sometype to T exists.

  3. Matt says:

    You’ve run out of foot note characters here’s a few more ‖, ¶. 🙂

    Lovely reasoning, I’ve often been annoyed at the unboxing convention so it’s nice to see a rationale why it doesn’t do it.

  4. configurator says:

    Great. Now can you please explain to the people I used to work with that unboxing an int by using Convert.ToInt32(obj) is bad for their health?

  5. John Rayner says:

    Great post, Eric, as always.  Very informative and thought-provoking.  And I think this one must get the record for the most footnotes ever used in a blog post!   🙂

  6. Kevin Westhead says:

    Interesting post. Do you know what the reasoning was behind the unbox instruction throwing InvalidCastException rather than having another exception type specifically for unboxing failures?

  7. > Do you know what the reasoning was behind the unbox instruction throwing InvalidCastException rather than having another exception type specifically for unboxing failures?

    Why not? Semantically, unboxing is a downcast ("this is B which I know is a D – give me that D" – "this is Object which I know is an Int32 – give me that Int32"). It’s not identity-preserving simply because value types do not have inherent identity, but otherwise I consider it the same thing, so it makes sense to me that the same exception is used to indicate failure in both cases.

    Of course, it is also possible to set the issue of identity preservation aside entirely, and just say that _all_ conversions deal with representations, and Base->Derived cast is also a conversion that translates a value of type "reference to Base" to _another_ value of type "reference to Derived" (the fact that the bit pattern may remain the same as a result is irrelevant). If you look at it that way, identity does not even enter into the question, because the value converted – which is the reference, not the object – does not have any identity. Also, from this POV, it makes more sense to have a single cast/convert operator.

  8. Rob says:

    I had a need for calling the conversion operators on a boxed value back in .net 2 so I wrote a basic dynamic cast method similar to what holatom is suggesting.  It’s here:

    This was a very targeted scenario, though, and I don’t think I’ve needed it since.

  9. commongenius says:

    VB’s CType operator caters to those who would rather not think about the different kinds of casting by generating the smallest amount of conversion code possible given the type information known at compile time. For example, CType(1, Integer) will actually not generate a cast at all, since the compiler knows it is unnecessary. CType(myInt, Long) will generate a conversion operation, the same as (long)myInt; while CType(myStream, MemoryStream) will generate a preserving conversion (“castclass” IL instruction), the same as (MemoryStream)myStream. Additionally, CType(myBoxedInt, Long) will generate a call to a VB helper function which will eventually cast the boxed int to IConvertible, and use that interface to convert to an Int64; this essentially performs the same function as (long)(int)myBoxedInt, although there is a performance cost. But, CType will still give you a compile time error if you try a conversion which is not possible under any circumstances (say, CType(myBoxedInt,

    I am not generally in favor of CType; I prefer explicitly stating the kind of conversion that I know is necessary (which is probably why, or indicative of why, I prefer C# over VB). But the existence of CType is interesting to me because it dramatically demonstrates the difference in philosophies between the languages. C# forces you to prove that you know what you are doing (and in the process often forces you to think through what you are doing, and maybe do it better). VB allows you to say, “I don’t care how you do it, don’t bother me with the details, just get it done.” Of course, if it CAN’T be done, you still get an exception, which you may have been able to find at compile time if you were forced to think about it.

    Indeed, this does highlight an important difference in design philosophies. I like to say that VB is a “do what I mean” language, C# is a “do what I say” language. — Eric


  10. > VB allows you to say, "I don’t care how you do it, don’t bother me with the details, just get it done."

    From what Eric says, it seems that (T)(dynamic)x would do just that in C# 4.0.

  11. Thank you for submitting this cool story – Trackback from DotNetShoutout

  12. As I understand it, the issue is that the actual type of the object is unknown until runtime, so the compiler can’t possibly know how to convert an unknown type to the type specified in the program.

    Would a virtual method defined on object, say T Cast<T>(), solve the problem? That way, the compiler can issue an unbox, emit the virtual call to the unknown type and let it deal with problem if it wants to?

    Sure, if we had generics in .NET v1 that would have worked. Basically you would be putting the onus on the developer of the type to provide conversions to arbitrary types. I like that in theory; I like thinking of a type as “a set of values associated with a set of conversion rules”. Such a system encapsulates that concept nicely. But that ship has sailed; we did not have generics in v1 and we’re not going to add new virtual methods to object now. — Eric

  13. commongenius says:

    pminaev said:

    "> VB allows you to say, "I don’t care how you do it, don’t bother me with the details, just get it done."

    From what Eric says, it seems that (T)(dynamic)x would do just that in C# 4.0."

    Not exactly. VB uses CType as a universal conversion operator, which will generate the most efficient form of conversion possible wherever it is used. You don’t have to think about which type of conversion you are doing in different circumstances, because you are using the same operator. In the worst case, where no type information is known (i.e. casting from Object), the least efficient form of conversion is used (which basically tries each kind of conversion in turn). Casting to dynamic in C# 4.0 essentially removes all type information, forcing you into the least efficient conversion, even if a more efficient conversion could have been performed had the type information been preserved (such as in the (int)(long)myBoxedLong case). The dynamic version is slightly better though, since it caches the results of the conversion overload resolution.

  14. > Casting to dynamic in C# 4.0 essentially removes all type information, forcing you into the least efficient conversion, even if a more efficient conversion could have been performed had the type information been preserved

    I don’t see any reason why the compiler wouldn’t be able to optimize the case of casting a value of type definitely known at compile-time to dynamic, as in my example – at least in theory. I doubt that the actual implementation in .NET 4 will do that – it seems to require too much effort for very dubious value – but it is certainly possible to optimize it in precisely the same way as CType does (since, after all, it has all the same inputs!).

    Indeed. Our philosophy for “dynamic” is “you said dynamic, so you meant dynamic, so you’ll get dynamic.” We have certainly considered doing compiler work to detect situations where we know at compile time that a dynamic call cannot possibly succeed, or detect situations where the compiler can deduce enough type information about the dynamic thing to skip making a dynamic call. As you correctly call out, this is an immense amount of work for bizarre corner cases that directly contradict the stated intention of the programmer — if the programmer says they want dynamic dispatch that might fail at runtime, that’s what they’ll get. — Eric


  15. > As you correctly call out, this is an immense amount of work for bizarre corner cases that directly contradict the stated intention of the programmer — if the programmer says they want dynamic dispatch that might fail at runtime, that’s what they’ll get.

    Will that be a hard requirement in the language spec, however? I.e. will the conforming compiler be required to defer the error until execution even if it can see that it is going to fail at compile time already? Or is it implementation-defined?

    I think it doesn’t touch the original case we discussed either way, though, as it wasn’t about an error case – it was merely about optimization for the successful case. Something like (int)(dynamic)1.0 cannot fail, and if the compiler is smart enough to handle that at compile-time, the "as if" rule should kick in.

  16. Anthony D. Green says:

    "VB allows you to say, ‘I don’t care how you do it, don’t bother me with the details, just get it done.’" (commongenius).

    Indeed we call that declarative programming. This is a quality of markup languages, SQL, RegularExpressions and highly desirable in many cases.

    This is not to say that it’s always desirable indeed I appreciate the recognition of a valid though different philosophy. I’ve recently come to terms with CType – until recently I was more fond of DirectCast because of its explicitness. I also sort of liked that the VB implementation of CType vs DirectCast separated out the two functions of the C# cast syntax – i.e. Conversion (representation-changing) versus Casting (identity-preserving). I’ve since decided that I was just being needlessly obsessive and that differing to the tool to exercise the most appropriate (hopefully optimized path) method was maintenence-wise more concise (or consistent).

    Ultimately it’s a choice between who gets the burden for optimizing implementation – the SQL Server team has to be deligent when emitting query plans to use all information to produce the query plan that’s most efficient whereas with the more explicit – imperative style the burden is shifted to the user. I prefer the former but always appreciate the option to go explicit as required but I think it’s most important for the programmer to be aware that in these cases they are delegating responsibility to the tool – that it’s not just magic.

    "VB is a ‘do what I mean’ language, C# is a ‘do what I say’ language " (Eric).

    I like this summarization. I rings well with my discoveries about some of the finer rules of Option Strict Off in VB (which is an advanced language feature, btw, not a novice one – which is why it should be off by default). A lot of people characterize it as a feature that let’s you just willy nilly do anything and everything and the compiler will just make things happen by parsing horoscopes. This is not true. When you look at the rules it won’t let you do anything that you couldn’t say explicitly (and can still give compile time warnings for things it knows you definitely can’t do) and even being explicit about the cast/conversion it’s still just as possible to ask to do something that will fail (I would argue that having to type CType doesn’t actually make me more likely to realize a conversion will fail at runtime but then again I don’t do a lot of conversions anyway)

    It’s a philosophical difference of whether I have to say Dim i As Integer = CType(o, Integer) or whether it’s sensible for the tool to infer from my assining o to i that I would naturally want – need – mean – to convert o to an Integer first – whether I should have to say something that I’d have to say anyway.

    Good read.

  17. You’ve been kicked (a good thing) – Trackback from

  18. Wardy says:

    I think that all went straight over my head … dam I feel stupid now, time to get those mcts books out.

  19. Tom says:

    I’m a new fan of your series.  Keep these fantastic articles coming!

    One correction:  Nullable<ValueType> is not imiplicitly convertable to ValueType.  It’s the other way around.

  20. Tom says:

    Sorry for the repost, wanted to provide a sample.  If you can merge these two posts, please do.

    int i = 0;

    int? j = null;

    int i = j;    // won’t compile

    int i = (int)j;    // throws InvalidOperationException

    int i = j ?? 0;    // works

  21. Eric Ouellet says:

    Hello Eric,

    I red your nice article.

    I don’t know if it is because I missed somthing important but I feel that C# could support something like that:

           // ******************************************************************
           public class Something       {       }
           // ******************************************************************
           public class SomethingWrapper
               private Something _something;
               public static implicit operator Something(SomethingWrapper wrapper)
                   return wrapper.GetSomething();
               public Something GetSomething()
                   return _something;
               public  SomethingWrapper(Something something)
                   _something = something;
               // More code here…
           // ******************************************************************
           private void button1_Click(object sender, RoutedEventArgs e)
               SomethingWrapper sw = new SomethingWrapper(new Something());
               Object o = sw;
               Something s = (Something)o; // InvalidCastException.

    If it exists only one way where the compiler could pass from “Object”(real object: SomethingWrapper) to “Something” object,

    then why the compiler does not create the code and just do it ?

    Why it would need dynamic type for that ?

    Why not doing it on object ?

    In order to make your scenario work, the generated code has to start the C# compiler, run overload resolution to determine that there is a conversion from the runtime type of the object, somehow call the conversion, and return the result. Remember, we have to run this code on EVERY cast. That would make C# a very slow language indeed. Now, if you want to start up the compiler again at runtime, in C# 4 you can do that if you choose to take that performance hit. Make the argument dynamic, and we’ll do all the analysis at runtime. — Eric

    If it would be supported, it would open the door to a fantastic world of generic wrapper classes.

    SomethingWrapper could be generics to works on anything, not only Something.

    Am I wrong ? Did I missed something somewhere ?


  22. Eric Ouellet says:

    Thanks Eric for your feedback, it was very quick !

    I absolutely do not want to use "dynamic" typed object. It’s untyped (no intellisense and many more, you know all that…). By definition is bad to use it!

    Althought the compiler is able to do portion of the discovery on the fly, actually (it could change), I do not want to touche dynamic object with a 10′ pole.

    Related to my previous sample…

    I would probably be able to do:

    something s = (dynamic)o; // Pass throught dynamic object to do dynamic cast to typed type ?

    Does the language will offer a new keyword of the style "dynamic_cast"… Will I be able to write:

    c#3.5 : Something s = (Something)o; // InvalidCastException.

    c#4   : Something s = dynamic_cast<Something>o; // Find the path to the object, throw an exception if more than one path available (no dynamic objec)

    That dynamic_cast is very nice… Hope Anders thought about it for C#4 !!!

  23. Eric Ouellet says:

    In fact, I do not understand why cast could not cast its source to its real underlying object type source before trying to cast ?

    Why I can’t do that, and why it would cost a lot to do it ?

           public class EnumValueWrapper<T>


               // **************************************************************

               private readonly T _enumValue;

               // **************************************************************

               public EnumValueWrapper(T enumValue)


                   _enumValue = enumValue;


               // **************************************************************

               public static implicit operator T(EnumValueWrapper<T> enumValue)


                   return enumValue.GetEnumValue();


               // **************************************************************

               public T GetEnumValue()


                   return _enumValue;


       … // Elided for clarity


    public enum FirstLast { first, last };

    FirstLast firstLast;

    EnumValueWrapper<FirstLast> firstLatsWrapper = new EnumValueWrapper<FirstLast>(firstLast);

    Object objFirstLast = firstLatsWrapper;

    firstLast = (FirstLast)objFirstLast; // Exception

    The compiler already as to do a check at runtime to ensure inheritance compatibility. There is already a kind of performance penalty.

    It could pregenerate code like (pre IL):

    firstLast = (FirstLast)objFirstLast;      ==>     firstLast = (FirstLast)(objFirstLast.GetType())objFirstLast;

    Everything would work perfectly and the cost would be minimal (I think). Are you agree ?

  24. Timwi says:

    Hi Eric, I believe you made a mistake in this post (or I misunderstood it).

    “Suppose the object is a Foo where there is a user-defined conversion from Foo (or one of its base classes) to int (or a type that int is implicitly convertible to, like, say, Nullable<int>).”

    If a user-defined conversion from Foo to Nullable<int> exists, I don’t see what relevance it has that int is implicitly convertible to Nullable<int> if you’re trying to cast to int. It seems only relevant to me that Nullable<int> is convertible to int, and only *explicitly*, so it seems that you might have meant “… or a type that is explicitly convertible to int”, in which case Nullable<int> would still be a candidate, but that would mean you made two mistakes in one sentence, which seems unlikely ;-).

    If I am wrong, would you like to clarify where my error lies?

    It is a confusing paragraph, sorry. Let me clarify. Suppose you have a cast expression that casts a Foo to int. Suppose further there is an implicit or explicit user-defined conversion from Foo to Nullable<int>.  When there is a cast operator in the code the compiler is permitted to put up to one *built-in* conversion, either explicit or implicit, on either side of an implicit or explicit user-defined conversion.  That is "(int)foo" is allowed to be bound as "(int)(int?)foo". There's an explicit or implicit conversion from foo to int? and an explicit conversion from int? to int. You are right that it would have been more clear to phrase it the other way. I'll change it. – Eric

  25. JMCF125 says:

    Hello Eric,

    This is the first post I see, but I can say this is a really good post.

    Because of this post, I'll now see the blogs more often.

    It also answered some of my questions.

    Thank you

  26. Sylvain says:


    What do you think about using one of the following simple functions

    static T Convert<T>(object value)


        return (T)System.Convert.ChangeType(value, typeof(T));


    static void Convert<T>(object value, out T t)


        t = (T)System.Convert.ChangeType(value, typeof(T));


    This is doing both unboxing and casting at once. I don't know performance of this, but this is handy. I especially like the second one since I don't even have to specify T:

    int kkk;

    Convert(ooo, out kkk); // Does the trick