Over the past couple of months, I've written up a bunch of notes surrounding the CLR type system. You'll notice that it's littered with “This is how Reflection deals with these types“, as it's ment to be the start of a document that illustrates Reflection>Type system interactions. As always, it's just a brain dump, and you can expect some errors, along with gaping holes in reasoning and information. Feel free to leave a comment if there is a particular part of these notes you'd like me to dig deeper on.
The underlying type system
Following is a description of the sets of types that are treated specially either by the virtual machine or the internal system. It’s important to have a good understanding of the characteristics of these sets as they are the key blocks reflection builds on.
Please refer to the ECMA spec (partition I) for a more formal description of those types. In the next sections we want to outline features that are more interesting from a reflection point of view, so definitions may be slightly different (and certainly less formal) than those presented in the ECMA specification. The idea is to provide a general set of rules that can help people using reflection and/or reflection emit to do type analysis.
Referred to common instances that are naturally allocated on the gc heap (i.e. Object, String). The object pointer points to type information for the instance (a MethodTable). The instance is fully described by the MethodTable.
We use Reference Type here to identify instances that are unambiguosly described by their MethodTable. That is, the instance pointer, points to the exact type information.
In that respect obtaining the TypeHandle from the instance is “conceptually” as simple as dereferencing the instance pointer and provides the full type information for that instance (Type.GetTypehandle(Object)).
ValueTypes are types that derive from System.ValueType. They cannot be derived from, so they are implicitly sealed types.ValueTypes have stack semantic; that is, allocation of a ValueType does not occur on the gc heap but rather on the stack. ValueTypes can be moved on the gc heap via boxing (‘box’ IL opcode). A copy of the instance is made onto the gc heap as a result of boxing.
When on the stack, ValueTypes are not self describing, they do not carry any type information. Usually an object* points to the MethodTable of that object, the type of that instance. That allows to retrieve type information by simply having a reference to an instance. That is not true for ValueTypes when allocated on the stack. A reference to a value type is a pointer to the beginning of the data for that value type.
In the unboxed form, they are tracked by flow analysis (either the type location of the local or an argument). Type safety is enforced by flow analysis (it’s referring to a location - local .locals - or argument - which have type information via the Method signature). In their boxed form, they carry their MethodTable and thus are self describing.
A ValueType derivative is allowed to declare methods of a static or instance nature. Virtual methods are not allowed to be specified on ValueType’s because for dispatch, the runtime needs the “this” pointer. Clearly, because the runtime identifies the ValueType through flow analysis, there is no “this” pointer on the stack at the point of invocation.
Primitive (Scalar types)
Primitives are built-in types that are treated specially by the system. They have special encoding in metadata and they have explicit opcodes that can operate on them (i.e. add opcode for numeric values). There are a set of implicit conversion rules (int to long) applied to them. Reflection implicitly provides conversion of primitives during invocation whenever there is no loss of information. In any other respect primitives have exactly the same semantic as ValueType since they are ValueTypes.
C# “int” and System.Int32 are essentially the same thing, the C# compiler uses the syntactic sugar to minimize syntax overhead. In general, these primitive types are derived from ValueType and perform a set of common operations (CompareTo, Equals, GetHashCode, ToString and various conversion methods).
Enums are ValueType as well. However they’re special because Enum can only wrap a limited set of primitive types. They have somewhat peculiar assignability rules with respect to their underlying types and in that respect it can be important to distinguish them from other ValueTypes or primitives.
For instance the beahvior of the castclass (or isinst) and unbox instructions over a boxed enum instance is somewhat inconsistent. Consider the following:
Color c = Color.Black;
Object o = c;
an IL sequence using classcast will fail to assign to int local i
castclass int32 // this will throw InvalidCastException
however an unbox instruction will work just fine
In other words there are valid and verifiable IL sequences that allow a boxed enum instance to be assigned to its underlying type or to a different compatible enum type, whereas other sequences will throw. Considering reflection always deals with boxed value, there is an asymmetry in operations that check type assignability (i.e. binding) versus operation that check instance assignability (i.e. invocation).
They are special reference types. Object layout for array is different to normal reference types – an instance of an array may point to a shared MethodTable. Instances of Object, String they all point to the same MethodTable (Object). This is true for all arrays of reference types as they share the same gc layout. For value types the array instance carries the specific MethodTable (i.e int, EnumColor, MyStruct).
Arrays are also special in their assignability rules (array covariance). A String can be assigned to an Object even though Object is not in the hierarchy of String.
ByRef types are managed pointers. ByRef don’t box and there are very few IL instruction that can be performed on ByRef types (i.e. ldind.ref, stind.ref). Because of that restriction there is never an instance of a ByRef type as far as reflection is concerned. However ByRef types are real, concrete types in the type system. Inspecting a method that takes a ByRef arg will reveal a unique type that is in no relationship with the type it represents (i.e. int& and int are in no relationship).
Reflection simulates ByRef and it’s the user responsibility to fetch the updated value out of the argument array used in reflection invocation. Verifiable IL ensures that ByRef types are only used in argument position. Usage of ByRef in any other location (return value, fields) results in unverifiable code.
Interesting features of ByRef
ChangeMyString(ref string s)
How do I discover via Reflection? When Reflection takes it as an argument (new Object myref) to late bound invoke a method that takes a reference, Reflection actually passes the object array, the method (invoked latebound) modifies the array reference instead of the actual ByRef variable that we wanted to modify.
They are a special form of ByRef. They are a struct composed of two fields: a ByRef value to a location and a type compatible to the ByRef location. It can specify a contract on a local variable via the locals signature, or can be used for method parameters. TypedRef also share the limitations of ByRef in verifiable code.
TypedRef could be used by dynamic and latebound languages as a way to tag a location with information that is different from the runtime type of an object. That type information could be used by a language runtime to direct binding. However given the verification limitation we are not aware of any language that uses TypedRef. TypedRef are used in vararg functions and are returned as a result of enumerating over the vararg argument list. The type the TypedRef points to is the statically declared type (as defined by the compiler in the metadata vararg signature).
There exists C# keywords and methods that hang of TypedReference that specifically deal with creating and manipulating TypedReferences:
__makeref(variable): // Construction of TypedReference
__reftype(typedReference) // Returns the Type that the TypedReference struct holds
__refvalue(typedReference) // Changes the value of the &ptr the TypedReference holds
They represent an unmanaged pointer. Used in unsafe code, the usage of pointers requires skip verification.
Though pointers can box, their type identity is lost when boxing. This is particularly difficult for Reflection that always deals with boxed entities. There is no way to make reflection code that uses pointers type safe. Of course code using pointers is never verifiable by definition, but the system will generally check type safety and do properly binding in the presence of pointers. Reflection is less accurate in that respect .
COMObject are types that are somehow exposed to the system via COM interop.
COMObject may be imported via tlbimp in which case they have some static type information that can be used to reason over them. However in the presence of a COMObject instance assignability is somewhat of a fuzzy concept.
Because of the semantic of COM, assignability is the result of a IUnknown::QueryInterface() call on that instance and thus may, in principle, give varying results for different instances. When no static type information is available every COM instance will show as an instance of __COMObject and few api will give results that are different from instance to instance (assignability, guid). In that sense type identity is not enough to guarantee assignability. The presence of __COMObject requires logic outside of the type system.
Interfaces are very much like Reference Types. However they are special in that they do not specify a concrete type but rather a contract a type must adhere to. Walking an interface hierarchy will not lead to Object even though Interface types are assignable to Object. Deriving from an interface does not require exposing publicly any of the interface’s methods, however the interface methods must be implemented at least privately by a concrete type. An InterfaceMapping struct is exposed via Reflection to identify the implementation of an interface for a given type.
An interface has a MethodTable (TypeHandle) but it never has an instantiation. Instances never point to an interface MethodTable.
A TransparentProxy is a type that acts as a proxy for another type. The TransparentProxy type is never revealed to the user. It’s an internal type. Every trasparent proxy points to the same MethodTable, the unique representation of the TrasparentProxy type. So without extra operations transparent proxy do not carry any type identity. In that respect dealing with atransparent proxy introduces a perf penalty to type identity in the system. When an instance is asked for its type a check about transparent proxy needs to preceed any other check as, if the object is indeed a transparent proxy, a more complicated dereference needs to happen in order to fetch the “real” type of that object.
There are explicit checks in the runtime to make sure you never get back a TransparentProxy. Inspecting mscorlib metadata will show a __TransparentProxy type, however there is no representation of that type as System.Type. TransparentProxy does not exist in the type system.
Perhaps another time…
Perhaps another time…
Normal type analysis over delegate types will work as expected. They are true types in the type system.
However they are special in the way they wrap a function pointer. They were build with the idea of being the verifiable function pointer in the system. There are very special verification rules around deelgate construction. The rules for verification around delegates are extremely tight and bound to a well known IL sequence.
There are also very special rules in how you define your delegate type inheritance. System.Delegate cannot be derived from directly. Every delegate type must derive from System.MulticastDelegate. The presence of both types is more an artifact of historical evolution of the system than a real need (the clr used to make a distinction between System.Delegate – single cast delegate – and System.MulticastDelegate. That distinction, however, is long gone).