Value Types


The
CLR’s type system includes primitive types like signed and unsigned integers of
various sizes, booleans and floating point types. style="mso-spacerun: yes">  It also includes partial support for
types like pointers and function pointers. 
And it contains some rather exotic beasts, like ArgIterators and
TypedByRefs.  (These are exotic
because their lifetimes are restricted to a scope on the stack, so they can
never be boxed, embedded in a class, or otherwise appear in the GC heap). style="mso-spacerun: yes">  Lastly, but most importantly, the type
system includes interfaces, classes and value types.


"urn:schemas-microsoft-com:office:office" /> size=2> 


In fact,
if you look at our primitive types the right way, they’re really just some value
types that are so popular and intrinsic that we gave them special encoding in
our type signatures and instructions.


size=2> 


The CLR
also supports a flexible / weak kind of enumeration. style="mso-spacerun: yes">  Our enums are really just a
specialization of normal value types which conform to some extra
conventions.  From the CLR’s
perspective, enums are type distinct aliases that otherwise reduce to their
underlying primitive type.  This is
probably not the way anyone else thinks of them, so I’ll explain in more detail
later.


size=2> 


Anyway
as we’ve seen our type system has value types all over the place – as structs,
enums, and primitive scalars.  And
there are some rather interesting aspects to their design and
implementation.


size=2> 


The
principal goal of value types was to improve performance over what could be
achieved with classes.  There are
some aspects of classes which have unavoidable performance
implications:


size=2> 



  1. style="MARGIN: 0in 0in 0pt; mso-list: l0 level1 lfo1; tab-stops: list .5in"> face=Tahoma size=2>All instances of classes live in the GC heap. style="mso-spacerun: yes">  Our GC allocator and our generation 0
    collections are extremely fast. 
    Yet GC allocation and collection can never be as fast as stack
    allocation of locals, where the compiler can establish or reclaim an entire
    frame of value types and primitives with a single adjustment to the stack
    pointer. 

  2. style="MARGIN: 0in 0in 0pt; mso-list: l0 level1 lfo1; tab-stops: list .5in"> face=Tahoma size=2>All instances of classes are self-describing. style="mso-spacerun: yes">  In today’s implementation, we use a
    pointer-sized data slot on every instance to tag that instance’s type. style="mso-spacerun: yes">  This single slot enables us to perform
    dynamic casting, virtual dispatch, embedded GC pointer reporting and a host of
    other useful operations.  But
    sometimes you just cannot afford to burn that data slot, or to initialize it
    during construction.  If you have
    an array of 10,000 value types, you really don’t want to place that tag 10,000
    times through memory – especially if dirtying the CPU’s cache in this way
    isn’t going to improve the application’s subsequent accesses. face=Tahoma size=2> 

  3. style="MARGIN: 0in 0in 0pt; mso-list: l0 level1 lfo1; tab-stops: list .5in"> face=Tahoma size=2>Instances of classes can never be embedded in other
    instances.  All logical embedding
    is actually achieved by reference. 
    This is the case because our object-oriented model allows “is-a”
    substitutability.  It’s hard to
    achieve efficient execution if subtypes can be embedded into an instance,
    forcing all offsets to be indirected. 
    Of course, the CLR is a virtualized execution environment so I suspect
    we could actually give the illusion of class embedding. style="mso-spacerun: yes">  However, many unmanaged structures in
    Win32 are composed of structs embedded in structs. style="mso-spacerun: yes">  The illusion of embedding would never
    achieve the performance of true embedding when blittable types are passed
    across the managed / unmanaged boundary. 
    The performance impact of marshaling would certainly weaken our
    illusion.

size=2> 


If you
look at the class hierarchy, you find that all value types derive from
System.Object.  Whether this is
indeed true is a matter of opinion. 
Certainly value types have a layout that is not an extension of the
parent Object’s layout.  For
example, they lack the self-describing tag. style="mso-spacerun: yes">  It’s more accurate to say that value
types, when boxed, derive from System.Object. style="mso-spacerun: yes">  Here’s the relevant part of the class
hierarchy:


size=2> 


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">                          
System.Object


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">                            
/      
\


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">                           
/ style="mso-spacerun: yes">        
\


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">                    
most classes  
System.ValueType


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">                                       
/      
\


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">                                      
/ style="mso-spacerun: yes">        
\


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">                             
most value types  
System.Enum


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">                                      
style="mso-spacerun: yes">            \


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">                                                   
\


style="FONT-FAMILY: 'Lucida Console'"> style="mso-spacerun: yes">                                                  
all enums


style="FONT-FAMILY: 'Lucida Console'"> size=2> 


style="mso-bidi-font-family: Tahoma">Why do I use
the term “most classes” in this hierarchy? 
Because there are several classes that don’t appear in that section of
the hierarchy.  System.Object is the
obvious one.  And, paradoxically,
System.ValueType is actually a class, rather than a value type. style="mso-spacerun: yes">  Along the same lines System.Enum,
despite being a subtype of System.ValueType, is neither a value type nor an
enum.  Rather it’s a base class
under which all enums are parented.


style="mso-bidi-font-family: Tahoma"> size=2> 


style="mso-bidi-font-family: Tahoma"> size=2>Incidentally, something similar is going on with System.Array and all the
array types.  In terms of layout,
System.Array really isn’t an array. 
But it does serve as the base class under which all kinds of arrays
(single-dimension, multi-dimension, zero-lower-bounds and non-zero-lower-bounds)
are parented.


style="mso-bidi-font-family: Tahoma"> size=2> 


Now is
probably a good time to address one of the glaring differences between the ECMA
spec and our implementation. 
According to the ECMA spec, it should be possible to specify either a
boxed or an unboxed value type. 
This is indicated by using either ELEMENT_TYPE_VALUETYPE <token> or
ELEMENT_TYPE_CLASS <token>. 
By making this distinction, you could have method arguments or array
elements or fields that are of type “boxed myStruct”. style="mso-spacerun: yes">  The CLR actually implemented a little of
this, and then cut the feature because of schedule risk. style="mso-spacerun: yes">  Presumably we’ll implement it properly
some day, to achieve ECMA conformance. 
Until then, we will refuse to load applications that attempt to specify
well-typed boxed value types.


size=2> 


I
mentioned earlier that the CLR thinks of enums rather differently than the
average developer.  Inside the CLR,
an enum is a type-distinct alias. 
We generally treat the enum as an alias for the underlying integral type
that is the type of the enum’s __value field. style="mso-spacerun: yes">  This alias is type-distinct because it
can be used for overloading purposes. 
A class can have three methods that are distinguished only by the fact
that they one takes MyEnum vs. YourEnum vs. the underlying integral type as an
argument.


size=2> 


Beyond
that, the CLR should not attach any significance to the enum. style="mso-spacerun: yes">  In particular, we do no validation that
the values of the enum ever match any of the declared enumerands.


size=2> 


I say
the CLR “should not” attach any significance, but the model shows some rough
edges if you look closely.  When an
enum is unboxed and is in its value type form, we only have static type
information to guide us.  We tend to
discard this static typing information and reduce the type to its underlying
integral type.  You can actually
assign a value of MyEnum to a variable of type YourEnum, as far as the JIT and
verifier are concerned.  But as soon
as an enum is boxed, it becomes self-describing. style="mso-spacerun: yes">  At that point, cast operations and
covariant array typechecks tend to be picky about whether you’ve got a boxed
MyEnum or a boxed YourEnum.  As one
of the architects of the C# compiler remarked, “Enums are treated exactly like
their underlying types, except when they aren’t.” style="mso-spacerun: yes">  This is unfortunate and ideally we
should clean this up some day.


size=2> 


While
we’re on the subject of using enums to create distinct overloads, it makes sense
to mention custom signature modifiers. 
These modifiers provide an extensibility point in the type system which
allows sophisticated IL generators to attach significance to types. style="mso-spacerun: yes">  For example, I believe Managed C++
expresses their notion of ‘const’ through a custom signature modifier that they
can attach to method arguments. 
Custom signature modifiers come in two forms. style="mso-spacerun: yes">  In the first form, they simply create
enough of a difference between otherwise identical signatures to support
overloading.  In their second form,
they also express some semantics. 
If another IL generator doesn’t understand those semantics, it should not
consume that member.


size=2> 


So an IL
generator could attach custom signature modifiers to arguments of an integral
type, and achieve the same sort of type-distinct aliasing that enums
provide.


size=2> 


Today,
custom signature modifiers have one disappointing gap. style="mso-spacerun: yes">  If you have a method that takes no
arguments and returns void, there isn’t a type in the signature that you can
modify to make it distinct.  I don’t
think we’ve come up with a good way to address this yet. style="mso-spacerun: yes">  (Perhaps we could support custom
signature modifier on the calling convention?)


size=2> 


Back to
value types.  Instance methods,
whether virtual or non-virtual, have an implicit ‘this’ argument. style="mso-spacerun: yes">  This argument is not expressed in the
signature.  Therefore it’s not
immediately obvious that a method like “void m(int)” actually has a different
true signature depending on whether the method appears on a class or on a value
type.  If we add back the implicit
‘this’ for illustration purposes, the true signatures are really:


size=2> 


style="FONT-FAMILY: 'Lucida Console'">void m( [ style="mso-spacerun: yes">    MyClass style="mso-spacerun: yes"> this], int
arg)


style="FONT-FAMILY: 'Lucida Console'">void m( [ref MyStruct this],
int arg)


size=2> 


It’s not
surprising that ‘this’ is MyClass in one case and MyStruct in the other
case.  What may be a little
surprising is that ‘this’ is actually a byref in the value type case. style="mso-spacerun: yes">  This is necessary if we are to support
mutator methods on a value type. 
Otherwise any changes to ‘this’ would be through a temporary which would
subsequently be discarded.


size=2> 


Now we
get to the interesting part.  Object
has a number of virtual methods like Equals and GetHashCode. style="mso-spacerun: yes">  We now know that these methods have
implicit ‘this’ arguments of type Object. 
It’s easy to see how System.ValueType and System.Enum can override these
methods, since we’ve learned that these types are actually classes rather than
value types or enums.


size=2> 


But what
happens when MyStruct overrides GetHashCode? style="mso-spacerun: yes">  Somehow, the implicit ‘this’ argument
needs to be ‘ref MyStruct’ when the dispatch arrives at MyStruct’s
implementation.  But the callsite
clearly cannot be responsible for this, since the callsite calls polymorphically
on boxed value types and other class instances. style="mso-spacerun: yes">  It should be clear that a similar
situation can occur with any interface methods that are implemented by a value
type.


size=2> 


size=2>Something must be converting the boxed value type into a byref to the
unboxed value type.  This
‘something’ is an unboxing stub which is transparently inserted into the call
path.  If an implementation uses
vtables to dispatch virtual methods, one obvious way to insert an unboxing stub
into the call path is to patch the vtable slot with the stub address. style="mso-spacerun: yes">  On X86, the unboxing stub could be very
efficient:


size=2> 


style="FONT-FAMILY: 'Lucida Console'"> style="mso-tab-count: 1">      add ecx, 4 style="mso-spacerun: yes">    ; bias ‘this’ past the
self-describing tag


style="FONT-FAMILY: 'Lucida Console'"> style="mso-tab-count: 1">      jmp
<target>  ; now we’re ready
for the ‘ref struct’ method


size=2> 


Indeed,
even the JMP could be removed by placing the unboxing stub right before the
method body (effectively creating dual entrypoints for the method).


size=2> 


At
polymorphic callsites, the best we can do is vector through a lightweight
unboxing stub.  But in many cases
the callsite knows the exact type of the value type. style="mso-spacerun: yes">  That’s because it’s operating on a
well-typed local, argument, or field reference. style="mso-spacerun: yes">  Remember that value types cannot be
sub-typed, so substitutability of the underlying value type is not a
concern.


size=2> 


This
implies that the IL generator has two code generation strategies available to
it, when dispatching an interface method or Object virtual method on a
well-typed value type instance.  It
can box it and make the call as in the polymorphic case. style="mso-spacerun: yes">  Or it can try to find a method on the
value type that corresponds to this contract and takes a byref to the value
type, and then call this method directly.


size=2> 


Which
technique should the IL generator favor? 
Well, if the method is a mutator there may be a loss of side effects if
the value type is boxed and then discarded; the IL generator may need to
back-propagate the changes if it goes the boxing route. style="mso-spacerun: yes">  Also, boxing is an efficient operation,
but it necessarily involves allocating an object in the GC heap. style="mso-spacerun: yes">  So the boxing approach can never be as
fast as the ‘byref value type’ approach.


size=2> 


So why
wouldn’t an IL generator always favor the ‘byref value type’ approach? style="mso-spacerun: yes">  One disadvantage is that finding the
correct method to call can be challenging. 
In an earlier blog (Interface layout), I revealed some of this
subtlety.  The compiler would have
to consider MethodImpls, whether the interface is redundantly mentioned in the
‘implements’ clause, and several other points in order to predict what the class
loader will do.


size=2> 


But
let’s say our IL generator is sophisticated enough to do this. style="mso-spacerun: yes">  It still might prefer the boxing
approach, so it can be resilient to versioning changes. style="mso-spacerun: yes">  If the value type is defined in a
different assembly than the callsite, the value type’s implementation can evolve
independently.  The value type has
made a contract that it will implement an interface, but it has not guaranteed
which method will be used to satisfy that interface contract. style="mso-spacerun: yes">  Theoretically, it could use a MethodImpl
to satisfy ‘I.xyz’ using a class method called ‘abc’ in one version and a method
called ‘jkl’ in some future version. 
In practice, this is unlikely and some sophisticated compilers predict
the method body to call and then hope that subsequent versions won’t invalidate
the resulting program.


size=2> 


Given
that a class or value type can re-implement a contract in subsequent versions,
consider the following scenario:


size=2> 


style="FONT-FAMILY: 'Lucida Console'">class Object { public virtual
int GetHashCode() {…} … }


style="FONT-FAMILY: 'Lucida Console'">class ValueType : Object style="mso-spacerun: yes">  { public override int GetHashCode() {…}
… }


style="FONT-FAMILY: 'Lucida Console'">struct MyVT : ValueType {
public override int GetHashCode() {…} …}


size=2> 


As we
know, MyVT.GetHashCode() has a different actual signature, taking a ‘ref MyVT’
as the implicit ‘this’.  Let’s say
an IL generator takes the efficient but risky route of generating a call on a
local directly to MyVT.GetHashCode. 
If a future version of MyVT decides it is satisfied with its parent’s
implementation, it might remove this override. style="mso-spacerun: yes">  If value types weren’t involved, this
would be an entirely safe change. 
We already saw in one of my earlier blogs (Virtual and non-virtual) that
the CLR will bind calls up the hierarchy. 
But for value types, the signature is changing underneath us.


size=2> 


Today,
we consider this scenario to be illegal. 
The callsite will fail to bind to a method and the program is rejected as
invalid.  Theoretically, the CLR
could make this scenario work.  Just
as we insert unboxing stubs to match an ‘Object this’ callsite to a ‘ref MyVT
this’ method body, we could also create and insert reboxing stubs to match a
‘ref MyVT’ callsite to an ‘Object this’ method body.


size=2> 


This
would be symmetrical.  And it’s the
sort of magic that you would naturally expect a virtual execution environment
like the CLR to do.  As with so many
things, we haven’t got around to even seriously considering it
yet.

Comments (20)

  1. Mike says:

    Chris, you need a girlfriend! =)

  2. Recently I looked into the implementation of Equals for Value Types and I was surprised by what I found. Object.Equals actually does a memcmp for value types (why?) and ValueType.Equals overrides this with a more complicated algorithm that either does a memcmp (if the value type has no reference members) or uses reflection to call Equals on each member.

    I don’t like both of these. Object.Equals shouldn’t deal with value types (is this perhaps a historical left over, from a time where System.ValueType didn’t yet exist?)
    I don’t like ValueType.Equals because using a memcmp for value types that don’t contain references isn’t all that sensible, because the individual field types could have overriden Equals to do something that isn’t equivalent to a memcmp. Also, changing the semantics of Equals because of the presence or absence of a reference field isn’t very intuitive either.

  3. Chris Brumme says:

    Jeroen,

    Object.Equals considers whether ‘this’ is a value type, because a language like C++ allows scoped calls to virtual methods. In other words, you can call Object.Equals on ValueType instances. Last week I was talking to a dev who will be rewriting some of this code for better performance, but he intends to mostly retain the current semantics. In part, this is because it would be a breaking change to switch these methods over to a significantly different plan. And in part, this is because the current semantics aren’t terribly broken. If you are satisfied with the default implementations, you can use them. If you are dissatisfied with them, you can add a more appropriate override on any value types you author.

    I think the original plan was to have something that was at least reasonable and that could perform well. Our current implementation does not perform well, but we should be able to replace it with a better memcmp style of comparison that is very efficient.

    Of course, the actual details won’t be clear until our next release.

  4. Chris Brumme says:

    Mike,

    I’m not sure what Kathryn’s cryptic comment is saying. But I would guess that — as my wife of 18 years — she is unwilling to share me with a girlfriend.

    She’s been out of town for a week or so. In fact, I wrote the ValueType blog while sitting at the airport waiting for her. Now that I can spend time with her again, I expect I’ll be blogging less.

  5. LOL, nice one Chris 😉 Let me echo what others have said here and thanks for providing such a technical overview of CLR subjects. I guess you don’t have the cycles for a book but this kind of info would be a great backdrop to Jeffs and Don’s books.

  6. Keith Brown says:

    Interesting that the instance header is 4 bytes long; simply a pointer. I’ve been led to believe that it was 8 bytes long, consisting of a type handle, sync block index, and some other bit flags. Was I wrong? If it’s just a pointer, how does a sync block get associated with the instance?

  7. Pinku Surana says:

    Great explanation of ValueTypes. You state: "boxing is an efficient operation, but it necessarily involves allocating an object in the GC heap." I have found boxing to be rather inefficient, actually. For example, storing ints in a generic collection is much slower than an int-specific collection, even after substracting the cost of type-casting.

    Would it be possible for the VM to handle the box instruction specially to improve allocation performance? Two ideas come to mind. (1) If boxed values do not escape the method, stackalloc the box. If possible, reuse the box. (2) The GC could allocate a special page for boxed Int32s (and others), where the entire page is marked Int32 so you don’t need a type tag over each box. The JIT knows that pointers to that page are really to boxed Int32s and "does the right thing". GC becomes more complex, too. I believe SMLnj does something similiar to allocate List structures (cons cells).

    Right now I warn people to be careful how they use ValueTypes. Since they will usually use structs for perf reasons, they should check that no box instruction is inserted into their inner loops. Otherwise, perf gets hit hard due to allocation.

  8. Chris Brumme says:

    Keith,

    Of course you are correct. The self-describing part is pointer-sized. And before that (at a negative offset from where we point at the object, for better cache line filling) is the obj header. This is really an aggressively unioned set of bit fields. One interpretation of those bits is a sync block index, but we also might place an AppDomain index, a hash code, a lock, information about specific types like Strings, finalization status and other information in there. Of course, it can’t all fit, so eventually we may have to overflow to some other structure like a sync table entry or even a sync block from that header.

    It’s a good job that one of us is paying attention.

  9. Chris Brumme says:

    Pinku,

    There’s clearly more that we can and should do to improve the performance of boxing.

    I’m not sure how often your first suggestion would pay off. For the particular languages I’m thinking of, boxing is a prelude to insertion into a collection, or making an indirected call (an interface or virtual call), or a late-bound call via reflection. In most of those cases, it’s either difficult to tell it won’t escape the method or it’s easy to show that it will escape.

    For the special case of calling virtual methods on Object, overrides on the valuetype can often be used to avoid this spurious box. I’m also hopeful that many of the scenarios where we box for collections can be avoided by using well-typed generic collections in a future release.

    The idea of using a special allocator has come up before. It’s a good idea that we may eventually pursue here, though I doubt that we’ll get to it for some time.

    Pinku, were you involved in Project 7? Are you using a dynamic language (where boxing performance is clearly an entirely different level of pain)? If so, I would love to get a list of what you currently consider the most painful limitations with the CLR in supporting your language. I’m interested in more than just performance here. For example, I know our tail call guarantees are probably too weak.

  10. This may seem like a silly question but just what type does a Enum take, value or referance and can a Enum be declared as a boxed (referance) value?

  11. Chris Brumme says:

    Pinku sent me the following email:

    Yes, I was involved in Project 7 with a Scheme compiler. A quickie respone is that a dynamic language needs type-checking and box/unboxing to be as fast as possible.
    Every possible trick is employed to make those two things fast. Scheme also needs tail-call to be as close as possible to a jmp. Unfortunately, these three things are not particularly fast on .NET, though there are well-known solutions to all.

    Let me think a bit more about issues beyond performance or I’ll end up writing a long, rambling email. 😉

    Gee, he makes it sound like long rambling stuff is a bad idea!

  12. Chris Brumme says:
  13. Chris Brumme says:

    Andrew,

    If you are asking about System.Enum, it is a reference type. If you are asking about specific enums, they derive from System.Enum and are value types. As such, they can be boxed. As I noted in the blog, ECMA specifies support for well-typed boxed types in signatures. The CLR has not implemented that support.

  14. Pinku Surana says:

    I wrote a bunch of stuff, then deleted it when I realized I had just described a fancy Scheme VM. That reflects my bias that nearly anything can be implemented given closures and dynamic code generation.

    I’ve been sick and am writing this in a NyQuil induced haze, so I hope this makes some sense.


    Closures: A closure is a tuple pointing to an environment and a function. A delegate is a closure with no environment; therefore, a closure could be a subclass of delegate with an extra slot for the environment.

    A closure constructor might be:
    Closure..ctor (BarClass target, Environment1 env, int ptrToBar)
    retType Bar (BarClass this, Environment1 env, args …)

    Some P7 compiler guys wrote their own closure class to do precisely this, but the code was unverifiable and does not benefit from any JIT optimizations. Language VMs implement closures efficiently by employing lots of tricks, many of which could benefit the CLR’s implementation of delegates. It is unfortunate that the delegate hierarchy is so rigid. Could you have done the MarshalByRef trick instead?


    Dynamic code generation: Often languages must generate code at runtime to implement dynamic behaviour like new dispatch trees, speculative inlining, type-specific code, etc. Reflection.Emit is much too slow to generate code on-the-fly. Also, it should be possible to pitch code when you don’t need it anymore, rather than have it frozen in an assembly once it is baked.

    [Serializable], generics, remoting, etc. are all instances where the CLR team felt they had to do specialized code generation. How would you do it if you were forced to write these in a user-library instead? You’d probably want the same code-generation stuff I want (maybe not the code pitching, but trust me – it’s really useful). My sense is that code-generation should be a cooperative task between static and JIT compiler writers, communicating with each other through attributes and a low-level Emit library.


    Continuations: It isn’t fair to ask for continuations. Instead, if you are adding Fibers to the next version, how about generalizing the interface so I can add something else via a C DLL? Wishful thinking, probably …


    Here’s a question for you: given your intimate knowledge of the CLR, how would you implement Dylan-like multiple dispatching, where methods can be loaded dynamically from different assemblies (you can’t see everything at compile or load time)?

  15. Frank Hileman says:

    It seems that if the CLR could support two types of object instances, ones in the heap, and ones on the stack, boxing/unboxing could be made faster, at least for local and temporary value types vars. That is, if the IL generator knows that a value type will need to be boxed, it could create a "stack instance" that consists of the bits normally in the value type, preceeded by the header. Then no copying or GC perf overhead. Of course the rest of the CLR would have to be able to handle those special "stack instances", which would have to be marked somehow in the header. This would not help value types in an array, where there is no extra space to stick the header, but perhaps they need less boxing.

    A silly related anecdote: many years ago I worked on the internationalization of a OO C application (custom OO environment) and we needed to support both double-byte and single-byte strings simultaneously, at runtime. To be able to incrementally internationalize the system, we decided that older code would continue to use char* strings, but newer code would use a pseudo-object called the "GString", simply because G was our traditional prefix. We added an extra byte at the beginning of each string, which was used to dynamically look up the apropriate function pointer for each "method" call, resolving to single-byte or double-byte code. When calling into older code, we incremented past that byte, and then passed the pointer. We had lots of laughs about "encapsulation", "private members", etc. with regards to our GString.

  16. Mike Dimmick says:

    I think the point about ‘boxed types on the stack’ is that you’re creating a boxed type in order to pass it to some other method which requires a System.Object. What that method then does with the boxed value is, in the general case, undiscoverable.

    However, if the inliner in the JIT (I assume there is one!) decides to inline all the called methods and can determine that those methods don’t do anything particularly interesting (i.e. only call methods defined on System.Object) the inliner might be able to generate code on the fly that calls the appropriate methods directly, with no boxing.

    Come to think of it, that’s probably already done.

  17. Joe Schwartz says:

    You wrote:

    "Incidentally, something similar is going on with System.Array and all the array types. In terms of layout, System.Array really isn’t an array. But it does serve as the base class under which all kinds of arrays (single-dimension, multi-dimension, zero-lower-bounds and non-zero-lower-bounds) are parented."

    This brings up an interesting question: When I have an int[] object in C#, can I safely assume that it’s always zero-lower-bounds?

    I realize that I can use Array.CreateInstance to create an Array object containing ints with non-zero-lower-bounds. However (and here’s the interesting part), I cannot cast that Array to an int[] — I get an InvalidCastException. From this, I gather than a general Array object can use any lower bounds, but the more specific int[] (or any object[]) must have zero lower bounds. Is that correct?

  18. Chris Brumme says:

    Joe,

    The single-dimension array types in C# (e.g. Object[] and int[]) are indeed constrained to have zero-lower-bounds. The array hierarchy is a bit surprising in how it is structured. I have added this to the growing list of possible blog articles.