The Truth About Value Types

As you know if you’ve read this blog for a while, I’m disturbed by the myth that “value types go on the stack”. Unfortunately, there are plenty of examples in our own documentation and in many books that reinforce this myth, either subtly or overtly. I’m opposed to it because:

  1. It is usually stated incorrectly: the statement should be “value types can be stored on the stack”, instead of the more common “value types are always stored on the stack”.
  2. It is almost always irrelevant. We’ve worked hard to make a managed environment where the distinctions between different kinds of storage are hidden from the user. Unlike some languages, in which you must know whether a particular storage is on the stack or the heap for correctness reasons.
  3. It is incomplete. What about references? References are neither value types nor instances of reference types, but they are values. They’ve got to be stored somewhere. Do they go on the stack or the heap? Why does no one ever talk about them? Just because they don’t have a type in the C# type system is no reason to ignore them.

The way in the past I’ve usually pushed back on this myth is to say that the real statement should be “in the Microsoft implementation of C# on the desktop CLR, value types are stored on the stack when the value is a local variable or temporary that is not a closed-over local variable of a lambda or anonymous method, and the method body is not an iterator block, and the jitter chooses to not enregister the value.”

The sheer number of weasel words in there is astounding, but they’re all necessary:

  • Versions of C# provided by other vendors may choose other allocation strategies for their temporary variables; there is no language requirement that a data structure called “the stack” be used to store locals of value type.
  • We have many versions of the CLI that run on embedded systems, in web browsers, and so on. Some may run on exotic hardware. I have no idea what the memory allocation strategies of those versions of the CLI are. The hardware might not even have the concept of “the stack” for all I know. Or there could be multiple stacks per thread. Or everything could go on the heap.
  • Lambdas and anonymous methods hoist local variables to become heap-allocated fields; those are not on the stack anymore.
  • Iterator blocks in today’s implementation of C# on the desktop CLR also hoist locals to become heap-allocated fields. They do not have to! We could have chosen to implement iterator blocks as coroutines running on a fiber with a dedicated stack. In that case, the locals of value type could go on the stack of the fiber.
  • People always seem to forget that there is more to memory management than “the stack” and “the heap”. Registers are neither on the stack or the heap, and it is perfectly legal for a value type to go in a register if there is one of the right size. If if is important to know when something goes on the stack, then why isn’t it important to know when it goes in a register? Conversely, if the register scheduling algorithm of the jit compiler is unimportant for most users to understand, then why isn’t the stack allocation strategy also unimportant?
Having made these points many times in the last few years, I’ve realized that the fundamental problem is in the mistaken belief that the type system has anything whatsoever to do with the storage allocation strategy. It is simply false that the choice of whether to use the stack or the heap has anything fundamentally to do with the type of the thing being stored. The truth is: the choice of allocation mechanism has to do only with the known required lifetime of the storage.

Once you look at it that way then everything suddenly starts making much more sense. Let’s break it down into some simple declarative sentences.

  • There are three kinds of values: (1) instances of value types, (2) instances of reference types, and (3) references. (Code in C# cannot manipulate instances of reference types directly; it always does so via a reference. In unsafe code, pointer types are treated like value types for the purposes of determining the storage requirements of their values.)
  • There exist “storage locations” which can store values.
  • Every value manipulated by a program is stored in some storage location.
  • Every reference (except the null reference) refers to a storage location.
  • Every storage location has a “lifetime”. That is, a period of time in which the storage location’s contents are valid.
  • The time between a start of execution of a particular method and the method returning normally or throwing an exception is the “activation period” of that method execution.
  • Code in a method can require the use of a storage location. If the required lifetime of the storage location is longer than the activation period of the current method execution then the storage is said to be “long lived”. Otherwise it is “short lived”. (Note that when method M calls method N, the use of the storage locations for the parameters passed to N and the value returned by N is required by M.)

Now we come to implementation details. In the Microsoft implementation of C# on the CLR:

  • There are three kinds of storage locations: stack locations, heap locations, and registers.
  • Long-lived storage locations are always heap locations.
  • Short-lived storage locations are always stack locations or registers.
  • There are some situations in which it is difficult for the compiler or runtime to determine whether a particular storage location is short-lived or long-lived. In those cases, the prudent decision is to treat them as long-lived. In particular, the storage locations of instances of reference types are always treated as though they are long-lived, even if they are provably short-lived. Therefore they always go on the heap.

And now things follow very naturally:

  • We see that references and instances of value types are essentially the same thing as far as their storage is concerned; they go on either the stack, in registers, or the heap depending on whether the storage of the value needs to be short-lived or long-lived.
  • It is frequently the case that array elements, fields of reference types, locals in an iterator block and closed-over locals of a lambda or anonymous method must live longer than the activation period of the method that first required the use of their storage. And even in the rare cases where their lifetimes are shorter than that of the activation of the method, it is difficult or impossible to write a compiler that knows that. Therefore we must be conservative: all of these storage locations go on the heap.
  • It is frequently the case that local variables and temporary values can be shown via compile-time analysis to be unused after the activation period ends, and therefore can be treated short-lived, and therefore can go onto the stack or put into registers.

Once you abandon entirely the crazy idea that the type of a value has anything whatsoever to do with the storage, it becomes much easier to reason about it. Of course, my point above stands: you don’t need to reason about it unless you are writing unsafe code or doing some sort of heavy interoperating with unmanaged code. Let the compiler and the runtime manage the lifetime of your storage locations; that’s what its good at.


Comments (68)

  1. Jon Freeland says:

    Very interesting read… thanks for this.

  2. Aakash Mehendale says:

    A really useful and solid post that I can see being the target of links from StackOverflow for years to come 🙂

    Tiny typo: "its" in the last sentence wants to be "they're", I think?

  3. Shuggy says:

    "There are three kinds of storage locations: stack locations, heap locations, and registers."

    Not trying to be a pedant but just out of interest do you consider the compile time known strings (or indeed any other such 'baked in' reference types) to be "heap locations".

    I assume fixed buffers within structs despite looking superficially like an array would be treated as not being reference types but instead simply a pointer to the interior of the struct and thus inherit their storage rules by whatever happens to their parent. (stackalloc buffers follow from your statements on pointers without any special cases)

    Would you consider thread statics to be be considered (opaque) sugar around a stack location (even if just the threadid) and (possibly several) heap location(s).

  4. Lambda says:

    "It is frequently the case that array elements, fields of reference types, locals in an iterator block and closed-over locals of a lambda or anonymous method must live longer than the activation period… must go to the heap"

    Is the type the major driving factor in deciding what the lifetime of the value would be? How does the CLR decide what the required lifetime is? Based on the above statement, looks like this has been derived by  observing how types are used. Is there a particular logic that the CLR follows to determine the lifetime along with just looking at the type?

  5. Lambda says:

    As a followup to my question, can we game the system?  meaning, can I include something in my program to make the CLR think a particular value goes to the heap instead of the stack/

  6. Simon Cooper says:

    I do agree with what you've put, but one issue I've found when trying to explain value & reference types is trying to get across the basic concept of 'value type' to someone with minimal knowledge of the CLR and .NET. Most programmers have some knowledge of what 'the stack' and 'the heap' are, what they're role is, and what they do, so although 'value types live on the stack', or 'value types you're manipulating go on the stack' are wrong, they are a variant of 'lies to children' – they explain the basic concept in a simplified way that is easy to understand. Later on do they learn all the caveats to what you've said, once they've understood the basic concept.

    One similar example is in (British school) GCSE chemistry (about 14-15 years old), where you learn that there are two separate types of molecular bond – 'ionic' and 'covalent', with molecules either using one or the other depending on some very simple properties. Only later on in A-level (16-18 years old) do you learn that this is actually wrong – there is a continuum between ionic and covalent bonds (and I'm sure, in university chemistry courses, do you learn that that itself is a simplification). The difference between 'value types' and 'reference types' is similar.

    So, although 'value types live on the stack' is wrong, it can be a useful first step to help someone fully understand what a value type is and how it behaves.

  7. LondonBridge says:

    Why is Microsoft Press getting this wrong?

    MCTS 70-536 from from Microsoft Press says in the first chapter, second line

    "Value types are variables that contain their data directly instead of containing

    a reference to the data stored elsewhere in memory. Instances of value types are

    stored in an area of memory called the stack, where the runtime can create, read,

    update, and remove them quickly with minimal overhead"

  8. DaRage says:

    Well, what you're saying is perfectly correct. However from a developer perspective, the most important difference to know about value types and reference types is that value types get copied in the stack when they're passed as arguments. Knowing that the developer must avoid creating big value types that are copied inefficiently.

    So from that perspective it's practical for developer to think that value types are stored in the stack.

  9. Preets says:

    That was really interesting and useful. Understanding storage allocation w.r.t. lifetime makes a lot more sense than mapping them to value/reference types as the later eventually leads to confusion.

    Just wanted to clarify, an instance of a struct with a reference type as a field will be stored on the stack (in a typical situation, minus the exceptions) and the reference to the reference type will be stored on stack too. Is that accurate?

    (Because that is what I understood after reading the three rules in your previous blog…/the-stack-is-an-implementation-detail-part-two.aspx)

  10. Winston Smith says:

    "It is frequently the case that array elements, fields of reference types …<snip>… Therefore we must be conservative: all of these storage locations go on the heap."

    Does that mean if I have an array of ints in a local method, the ints in the array go on the heap?

  11. Jon Skeet says:

    @DaRage: Surely what's important is that the value is copied – not where it's copied to and from. If the value were being copied to the heap instead of to the stack, would that make it okay to have huge value types?

    Learning about the copying behaviour of arguments (and simple assignments) is obviously important, but I think the detail of heap/stack allocation is a distraction there.

  12. Eric Lemes says:

    Excelent post, Eric!

    By the way, I found your blog while searching for this subject (your older post).

    I think most of this "local variables are allways stored on the stack" conviction comes from unmanaged world. One time, in an interview, one guy asked my about this, knowing that my primary programming language is C#.

    I really never cared about this until this situation, just because I choosed for a managed language and I can live with the idea that CLR is there to choose a better way to JIT my code. When I saw your post about "stack/heap storage is an implementation detail", it sounded like music for me. It's nice to know implementation details and how you can use them to get more performance, but I really don't think you need to close your mind to one idea.

    Recently I'm started working with C++ and unmanaged environment. And I'm currently observing some differente cultural aspects about the two worlds. Usually the C/C++ developers is more familiar with low-level programming. With drivers and embedded systems and applications more closer to the operational system. And they are really worried about how the assembly code is generated and it's performance impacts. You allways falls in discussions like "inline or not inline", "template or not template". No, no inheritance, because vtable function call indirections will get a performance trouble. All of them is good questions, but sometimes I really don't think the benefits pay the costs.

    In other words. I think this phylosophyical questions about performance x portability x control is what make these cultural differences and still yet some resistance about managed environments. The "I allways need full control" x "I like building blocks" questions.

    Anyway, thanks for the precious information and the great content in your blog.


    Eric Lemes

  13. CunningDave says:

    I agree with DaRage – it's not just about doing pointer arithmetic, it's also about not doing anything dumb.  If it really doesn't matter, then why do Value Types exist at all?  Why not just make everything a Class or a ensure that it's a long-lived heap variable, or better yet, let the compiler and runtime do what they're good at?

    An interesting read, nonetheless.

  14. Eber Irigoyen says:

    I appreciate all the technical details and deep insight, I have learned a lot more about their behavior and relationship with the rest of the ecosystem, however, I'm still left with the question "what IS a value type", as in a single statement that begins with "a value type is"…

  15. configurator says:

    Very good post. I'd make a small change though. In the sentence:

    (Note that when method M calls method N, the use of the storage locations for the parameters passed to N and the value returned by N is required by M.)

    I'd change "required by M" to "required by both M and N". It's just as correct, but makes it just a little bit clearer because you don't have to think "Which was was M again?"

  16. DaRage says:

    @Jon Skeet: but it's a fact that copying only happens in the stack and never on the heap (in the case of assignment and arguments). I know you can say it's irrelevant but mentioning the stack and heap helps keeping the discussion concrete.

    The article keeps mentioning Desktop CLR and MIcrosoft Implementation but that what's vast majority of where .NET programs are running i.e. this is the de facto CLR so why bother?

    To tell you the truth, I would be afraid of discussing the article with colleague without being labeled an uber geek cause for the large part, it doesn't really matter and saying value types stored in the stack and reference types in the heap is good enough.

  17. Fujiy says:

    DaRage, I agree with Jon. What the programmer need to know is that value types is copied by VALUE, no matter if it is at stack

  18. Mark Knell says:

    Where does a value type get stored if it's large enough to qualify for the Large Object Heap?  (Not that a struct that large would be a good idea, but for the sake of completeness.)

  19. Alexey Bakhirkin says:

    After having posted the my previous comment, I've come up with its summary.

    Of course formally C# lacks the notion of stack. But it lacks this notion not in a storage-agnostic, but in storage-indeterministic way. In this condition of indeterminacy, the developers have invented a number of abstractions which help them to write programs capable of handling predictable sizes of input data. And stack is one of these abstractions. So we can say that C# itself in its indeterminacy gave birth to the notion of stack.

    Here we are left with an important philosophical question. If C# itself gave birth to the notion of stack, shouldn't we consider the stack to be an integral part of C#?

  20. Alexey Bakhirkin says:

    Hm… What I called "my previous comment" seems to be absent, so here it is:

    >"We've worked hard to make a managed environment where the distinctions between different kinds of storage are hidden from the user. Unlike some languages, in which you must know whether a particular storage is on the stack or the heap for correctness reasons."

    >"If if is important to know when something goes on the stack, then why isn't it important to know when it goes in a register? Conversely, if the register scheduling algorithm of the jit compiler is unimportant for most users to understand, then why isn't the stack allocation strategy also unimportant?"

    I cannot agree with your above cited points.

    First (and as an answer to the fist cited paragraph), C# is hardly unlike any 'some' language: for a developer it does matter where the values go. Consider an algorithm of at least O(n) space complexity. For C# we can say that an algorithm implementation which utilizes 'call stack' (i.e. an abstract storage, associated with a chain of nested function calls) for this O(n) is likely to fail (with stack overflow) for relatively small n, while an algorithm implementation  which utilizes some explicit storage (like Stack<T>) is likely to fail (with OOM) only for relatively large n. Given that, we can say that a C# developer should not use call stack for O(n) storage unless he knows what he's doing. Obviously we can't admit that "distinctions between different kinds of storage are hidden from the user"

    In fact I'm not aware of _languages_ that are stack-overflow-free (i.e. are storage-agnostic). But there exist language implementations (like chicken scheme) that don't have a notion of stack overflow for all hardware architectures they are available for (i.e. in a program compiled with chicken scheme you may recur untill you run out of address space for your heap).

    Second (and as an answer to the second cited paragraph) both register and stack allocation strategies are important. The way the program is written affects both of them. In turn register allocation affects application performance while stack allocation limits the size of input data the program can handle. But (I believe that) for a C# developer the latter usually matters much more than the former.

  21. Shuggy says:

    @DaRage Value types are always copied (barring ref semantics but the you're passing the something else instead) no matter where they reside, if they are within an array and you assign the value in index 1 to the value in index 2 you create a copy of what is in 1 and place it in 2. There is no requirement that this goes via the stack it is an excellent candidate for staying in registers or even being dealt with by some very funky logic in the CPU implementing a rapid memcpy.

    Understanding this is a help in understanding some of the reasons that making mutable value types is a very bad idea in most situations.  

  22. Jon Skeet says:


    Suppose you have a class with two fields, and both of those fields are of some large value type. What do you think happens when you write field2 = field1; ? It copies the value of field1 into field2… possibly via a stack value (I don't know, and it's frankly irrelevant). Copying happens, and both values are on the heap (probably; see all Eric's caveats about implementation). In what way does your claim that "copying only happens in the stack and never on the heap" hold up?

    You seem to have missed the point of Eric emphasizing the "this is for the desktop CLR" bit – it's trying to make it very clear that this is implementation specific, rather than being *inherently* part of C#. That's an important point to understand, even if you *also* want to know what happens to go on in the desktop CLR.

  23. [ICR] says:

    "In this condition of indeterminacy, the developers have invented a number of abstractions which help them to write programs capable of handling predictable sizes of input data. And stack is one of these abstractions."

    The stack is one of these abstractions the developers *of the implementation* created. There are other alternatives. They may all be stupid, and the stack may be the best one, but there is nothing that necessitates the stack. C# has not given birth to the notion of the stack, it has given birth to the need for some storage mechanism which the implementation takes care of.

    In cases where you really need to squeeze out performance, it may be worthwhile knowing where the implementation stores the data. But these are edge cases, and the story of where it's actually stored is complicated. There is no need to pretend that the storage location is an important distinguishing factor between value and reference types.

    The conceptual difference of values being unchangeable values like '10' and reference types being … well anything else (at least conceptually); the fact value types store the data, while reference types store references to the data; the implications this has for allocation, and how copying a value type is likely a larger task than copying a reference; these things are enough to be teaching people for them to make a good distinction between when to use a value and a reference type, based on conceptual semantics and performance characteristics, without caring whether it's stored on the stack, the heap, or in your mothers porridge.

  24. Stanislav Iaroshenko says:

    >value types are stored on the stack when the value is a local variable or temporary

    Does it require "or a field of stack-stored variable"?

  25. Alexey Bakhirkin says:


    >"The stack is one of these abstractions the developers *of the implementation* created."

    That is true. But in fact stack+heap abstraction can be used to reason about all possible kinds of implementations. We can think of stack+heap as of a 'storage model' (analogy with memory model intended). With such a model at hand (which defines likely small stack, likely large heap, allocation strategies and stuff) developers can reason about their programs' correctness. If in additing the language implementaion in use defines how the model abstractions are mapped to hardware and software resources (e.g address space regions, registers and stuff) then the developers can reason of their programs in concrete numbers (e.g. we need X Mb RAM to handle Y Mb input) and can reason of their programs' performance as well (knowing which model abstractions are implemented in the most efficient way).

    C# didn't have a storage model, so the developers invented an unofficial, folklore one. The desire to have a storage model is natural, so all these stack+heap talks come not from ignorance but rather from wisdom.

  26. DaRage says:

    @Jon Seek. I agree with you, maybe in that case the copying doesn't happen on the stack and it "maybe" happens in the heap. I don't know, and as you said, I shouldn't care.

    But still why is it so "important" to distinguish between the implementation detail and the specification when the implementation is overwhelmingly what matters. This is, to me, more theory check than a reality check. In that regard, you guys seem to be pissed off at something that's not that "important" after all.

  27. petebu says:

    I disagree some of what Eric has said and this discussion is getting pretty muddled so here is my attempt to clear it up.

    @[ICR] You said "there is nothing that necessitates the stack". But this is false. As long as a language has subroutines it has to have the concept of a call stack. Exotic hardware and such vague ideas are not needed either. The runtime will still have to use a stack.

    No it doesn't. A call stack is an implementation mechanism for lazy compiler writers like me, not a necessity. First off, let's not conflate the call stack with the storage for activation frames; there is no requirement whatsoever that they be the same data structure, and on some hardwares, they are different data structures. (I consider it unfortunate that in the x86 architecture they are the same data structure. If return addresses were not stored on the same stack as local variables then all those stack smashing attacks would be for naught.) Let's consider for now just the call stack function of the stack. I assure you that if I really wanted to write a language that has subroutines and did not use a stack *at all*, I could. I could transform the program into Continuation Passing Style. Since a CPS program has *no returns* there is no need for a call stack. You're never coming back. – Eric

    How the stack is implemented doesn't really matter. It could be a contiguous region of memory and a stack pointer or perhaps a linked list of stack frames stored in the heap. The hardware might help to manage the stack or it might not. The important thing is that it is that each stack frame has a lifetime equal to the what Eric calls the "activation period" of a method. (This is precisely the reason why it has to be a stack. Method activations have LIFO behaviour).

    Again, the data structure does not *need* to be a stack in order to get this LIFO behaviour. Rather, because method activations are conceptually LIFO, it is *convenient* to use a LIFO data structure like a stack. Again, the use of the stack for storing activation frames is a convenience, not a requirement. But more generally, you're making my point for me: the details of how the temporary store is implemented are implementation-defined. It need not be "the stack" – the one million byte pre-reserved per-thread data structure. The right way to reason about it is to think of the lifetime; don't think of it as "the stack", think of it as "the storage for short-lived data". When you say it that way, the statement becomes trivially false: "value types always go in the storage area for short-lived data". No, obviously they don't, because some of those values are going to be long-lived. – Eric

    Eric mentions registers but unlike the stack they really are an implementation detail. We could very well have a machine without registers (hold on, we have one already, it's called the CLR VM). Yes, registers are sometimes used to hold contents of locals. But as soon as you call another subroutine they have to be saved to the stack anyway.

    You keep on telling me what I *need* to do, but no, I don't need to do that. Again, I could generate the code for a method as a CPS transformation. Since the method call is never going to *return* then clearly I do not need to store the register values before the continuation is invoked because those register values are *never* going to be used again! Why store 'em if you ain't gonna read 'em? That's the awesome thing about CPS is that all these problems about keeping track of where you were and what you were doing simply go away. – Eric

    There IS a requirement that locals are stored on a stack. This is implicit in the concept of a subroutine and its "activation period". Closures do capture local variables. Conceptually, they save the current stack frame for the lifetime of the closure. (The are several ways to actually implement this.)

    No, there is not any such requirement. The idea that closures "save the current stack frame" is again an implementation detail that assumes that activations are stack frames by definition. Stack frames are just an implementation detail; there need not be any stack whatsoever, so why should there be "stack frames" if there's no stack in the first place? You seem to have this idea that just because all the implementations of C you've ever seen define a "frame" on the "stack" that that's how it's got to be. I'm here to tell you it isn't so. There are many interesting ways to write a compiler for a programming language and not all of them involve anything that resembles a stack frame for an activation. – Eric

    Eric is right that one shouldn't mix storage locations and types. However, talking about long-lived storage locations vs short-lived ones is terribly unhelpful. As he says, long-lived locations are always heap locations and short-lived locations are stack locations (the CLR doesn't have registers). Why not call them that?

    Because that is confusing an implementation detail (where the operating system puts the storage) with a semantic requirement of the language (that a particular storage have a particular lifetime.) – Eric

    It is true that "references and instances of value types are essentially the same thing". And references and instances of reference type are two different things. Perhaps this should be emphasised more, together with the dereference operator (".").

    Next Eric says "storage locations of instances of reference types are always treated as though they are long-lived". Another way of saying that is "instances of reference types are always stored in heap-locations". Surely that is what people mean when they say "reference types are allocated on the heap"?

    Sure. But they are not *required* to be. Again, this is an *economic* decision rather than a *requirement*. We could do lifetime analysis on instances of reference types and statically determine when they don't survive the method activation, and allocate them on the stack. We don't because the payoff of doing so is insufficiently high. – Eric

    We're now half-way there. Array elements and fields live inside arrays and classes so it is clear then that they are heap-allocated too. Closures are a special case (as I've said, conceptually they save the current stack frame).

    Then Eric says "it is frequently the case that local variables and temporary values can be shown via compile-time analysis to be unused after the activation period ends". Local variables can only be used after the activation period ends if they've been captured by a closure. But closures just save the current stack frame so local variables are always stored on the stack (conceptually). Since local variables only hold references or value types we can say that "local variables of value type are stack allocated".

    Now you're losing me. If a "stack frame", assuming such a thing exists, is "saved" then in what possible sense does it form a "stack"? It's not going to be popped in LIFO order anymore. The data structure is not logically a stack because it is not LIFO, and the implementation doesn't use "the stack", and yet you are saying it is *less* confusing to call that usage "the stack"? Less confusing to who? It's way more confusing to me, and I'm the guy who has to write the compiler! If you're going to talk about what goes on the stack and what goes on the heap, then be accurate about it. Saying "well, when I say "the stack" I don't mean "the stack", I mean the thing where we make a copy of some stack frame onto the heap so that it no longer has stack semantics" is confusing. – Eric

    To sum up, when someone says "value types are stored on the stack and reference types are stored on the heap" they usually mean "local variables of value type contain values directly and are stored on the stack whereas local variables of reference type contain a reference to instances of reference type which are stored on the heap".

    If that's what they mean then they should say that, instead of saying "value types are always stored on the stack". – Eric

    It is wrong to say that we don't need to reason about storage. A programmer has to be able to reason about the lifetime of objects he creates. He has to know that the lifetime of local variables follows that of the activation period of methods. And he has to understand that the lifetime of instances of reference types is indeterminate (hence the need for IDisposable). He has to know that when he needs more control over object lifetime he can use object pools etc.

    Now you are re-stating my point: yes, programmers need to reason about lifetime, not about storage. That's the whole point of this article: don't ask "where is this value stored?" – that's an implementation detail of the runtime – rather, ask "what is the lifetime of this value?" – that's a semantic requirement of the language that you can reason about independently of any knowledge of implementation details. – Eric

  28. Focus says:

    @petebu You are reasoning with a C++ background IMO where you need to know more on how things really work at a low level. With C# and the CLR,as you say, a programmer has to be able to reason about the lifetime of objects. That is completely true. But that does not mean that he needs to know how the heap or stack are involved. Like Eric said, that is an implementation detail.

    No one is saying that you shouldn't know when or when not to use IDisposable, when to use value types or reference types, etc. What isn't so clear to me and to others is if this knowledge necessarily needs to be tied with how to the CLR manages the stack, heap, registers etc. Its basically the same paradigm as general OOP. You need to know how to use an object and what methods are better for different purposes, but you dont actually need to know how the object works internally.

  29. Jon Skeet says:

    @DaRage: In order to reason about what's guaranteed, you need to be able to distinguish between what's specified and what's implemented.

    Suppose the C# team decided to create a reference type for each method, to hold the local variables for that method. The only thing on the stack in each frame might be a reference to an instance of that type. Suddenly *all* local variables are on the heap… and many things you may have previously stated become false, because you mixed up implementation and specification. If you think that's a crazy idea, consider that it's pretty much exactly what happens for iterator blocks.

    If you only ever reason in terms of what's guaranteed by the spec, then your arguments don't become invalid when the implementation changes in way which is backward-compatible in spec terms.

  30. petebu says:

    @Eric. About CPS. Of course you are right that you can have a separate stack for return addresses and local vars. But if you CPS transform the program, all you've done is moved the return address into the same structure (the activation frame) as the local vars. There is still a call stack hidden away there. The return is still there too (a jump to the return address). I still don't see how you can implement subroutines without it.

    But that is besides the point. Forget how it is implemented. A total beginner looking at his first program will see that when he calls a method, all the local vars disappear and that when the subroutine returns, a) local vars reappear with the same values as before and b) the computer knew where in the caller code to return to. In the other words the computer saved the local vars and return address. Now he notices that subroutine calls can be nested. So the computer must save a list of return addresses and local vars. Furthermore that list is only accessed in LIFO order. That is, it is a stack. This behaviour exists in all languages with subroutines (with the exception of those where vars have dynamic scope by default eg. Emacs Lisp). That's what I mean by a stack (not necessarily a concrete data structure) – all programmers must have a notion somewhat like this, so it is not just an implementation detail. Perhaps I am wrong on this so it's interesting to hear what others think. One could refrain from calling it a stack and talk about the "LIFO order of method activation". Or perhaps say that local variables have "lexical scope and dynamic extent/lifetime" but I am not sure if that helps (one then has explain the precise meaning of "dynamic").

    About registers. You said that "since the method call is never going to *return* then clearly I do not need to store the register values before the continuation is invoked because those register values are *never* going to be used again". When you invoke the continuation (that is, you jump to return address – they are the same thing) you will have to restore the registers in use before you entered the subroutine. But the point I was trying to make is that registers are a bit of a red herring here. Your entire blog post could have been written without mentioning registers. The CLR doesn't have them and there is no need to reach below the CLR to make your argument.

    About short/long-lived references. My main objection is to these two terms. "Short-lived" and "long-lived" relative to what. Is it measured in seconds, CPU cycles, method activation periods? The GC might run during the current method call – then the contents of a long-lived location might be freed before the local short-lived locations. The important thing about the stack-location isn't that it is relatively short-lived compared to a heap location but that the lifetime is a) deterministic and b) corresponds to method activations. For heap locations, the important idea is that (in a GCed language) the lifetime is indeterminate. If you don't like calling them stack and heap locations you need a better name that conveys the lifetime. Perhaps, "locations with dynamic lifetime" (these exist during the dynamic lifetime of a method activation) and "locations with indeterminate lifetime" (these exist until an indeterminate point in the future when the GC runs) ?

    Btw, I agree with your *original* blog post that we should concentrate on what is observable not on how it is implemented. When someone asks about the difference between value types and reference types one of the key differences that doesn't get mentioned much is that a value type represents the set of values that you think it does whereas a reference type represents the set of values that you think it does union the set containing null. Once we forbid nulls and make heavy use of immutable objects then the observable distinction between value types and reference types begins to fade away. This happens in F# for instance (unfortunately nulls can sneak in from C# code).

  31. Nick Aceves says:

    CunningDave: "then why do Value Types exist at all?  Why not just make everything a Class or a ensure that it's a long-lived heap variable, or better yet, let the compiler and runtime do what they're good at?"

    Because Value Types represent something semantically very different than Reference Types. Value Types do not have Identity. Reference Types do.

  32. petebu says:

    @Eric. I've reread your post and realised that you did define "short-lived" and "long-lived" quite precisely. I missed that initially – sorry. The term "dynamic extent" is not new of course. The site at contains some useful definitions. In particular:

    dynamic extent – An object has dynamic extent if its lifetime is bounded by the execution of a function or some other block construct.

    indefinite extent – An object has indefinite extent if its lifetime is independent of the block or function-call structure of the program.

    stack allocation – Stack allocation means run-time allocation and deallocation of storage in last-in/first-out order.

    heap allocation (also known as dynamic allocation) – Heap allocation or dynamic allocation means run-time allocation and deallocation of storage in arbitrary order.

  33. Vince says:

    Is it safe to say that reference types always go on the heap?

  34. Vince: it is safe to say that, in current Microsoft .NET implementation of CLR, storage referenced by values of reference types is always allocated on the heap.

  35. Ben Voigt says:

    In case we didn't already have enough confusion between the x86 stack (PUSH and POP instructions and the ESP register), the stack of nested method activations (which as Eric points out doesn't actually have to use the x86 stack), the register file (locations in which become associated and dissociated with formal x86 registers such that even local variables which aren't assigned to registers by the JIT might in fact be held in the register file), and Stack<T> instances, let me remind you about this thing called an "MSIL operand stack" which IS part of the .NET specification.

    While there may be no requirement that a data structure called "the stack" be used to store local variables, in MSIL all method parameters are required to be placed onto "the operand stack", and of course that says absolutely nothing about the machine code finally generated by the JIT, which could even (during inlining) eliminate parameters which do not vary at runtime and *not store the parameters anywhere*.

  36. TheCPUWizard says:

    Eric, It would be interesting to see your response to an "inversion" of the original question…."What would influence your choice of class vs. struct for various items in a high performance / low latency application which used massive amounts of data?"

  37. Stuart says:

    I've been thinking for a long time about what it would take to make value types that I, personally, don't find confusing. For example, int, bool, and enum are value types and I have never had a problem understanding their semantics. I believe that what's necessary for me to not find them confusing is immutability.

    I realize there are good reasons in some performance-critical scenarios to have mutable value types, so of course I wouldn't want to get rid of the current capability to do that. But I think it'd be handy to have some syntactic sugar for immutable value types. I was reminded of this again today when I discovered the limitation that you can't implicitly reference "this" in a query expression in a value type, and realized that thanks to your blog I do understand why that's necessary but that it wouldn't be if you could declare that the struct is immutable.

    My proposal would be to allow "readonly" as a modifier on struct types. A readonly struct would have the following characteristics:

    The "readonly" modifier would be implied on all fields

    "private set" would be required on all automatic properties, and those properties could only be set from the constructor (or, alternatively, automatic properties would be forbidden)

    No other property setters would be permitted

    All fields (including automatic property backing fields) would be required to be of other readonly struct types (with bool, enum, char and all the integer types grandfathered in as being considered readonly, and Nullable<T> considered readonly if T is).

    Every instance method that references "this", explicitly or implicitly, would be translated under the hood by the compiler into a private static method that takes _this as a parameter (a value parameter, not a ref or out parameter). The explicit and implicit references to "this" within the method body would be translated to refer to _this instead. The instance method would simply call the static one.

    Any thoughts? Did I leave any loopholes that would permit mutation, other than reflection?

  38. I wish I'd see this post a few days back…  someone should have sent petebu off to learn about Scheme's call/cc.   There's always a stack, and functions lifetimes are LIFO?   Pfft.

  39. Matthew says:

    Jon, coming at this from the other direction — how then would you describe the different between value and reference types?

  40. Eugene says:

    Types may not mandate specific storage, but they do impose constraints on what storage strategies makes sense to implement, because type defines a set of operations on it's values and their semantics. Reference types in C# have 2 properties

    – Mutable values with changes visible via all variables pointing to the object

    – Reference equality operation (do these 2 variables point to the same object?)

    Without these operations one could consider some copying strategies — e.g. allocate all objects on the stack, then migrate to heap objects references to which were stored in longer lived objects. (Also migrate heap objects to thread-local heaps, node-local heaps in distributed systems etc). "Migration" could be done by simple copying. The 2 constraints I mentioned above make such strategies either not efficient (need to update all references to point to the new copy — mini-gc) or not usable in general case (JVMs do escape analysis and stack allocate objects which are provably not leaked outside).

    Semantics of value types in C# was carefully crafted to allow straightforward allocation of their values on the stack.

  41. Brian says:

    I had an interview question on this recently, some blah blah about where value types and reference types are stored.

    I thought that the answer "Who cares?" is probably not what they were looking for so give the usual, and aparently wrong, schpiel.

    I didn't get the job but I wish I had their email addresses to send them this link.

  42. Anthony P says:

    @Brian, imagine for a second they read this blog and therefore wanted you to say "Who cares? It's an implementation detail!"

  43. Jeff C says:

    Why have value types at all? What's the point?

  44. Gabe says:

    JeffC: Imagine that you have a bitmap manipulation class. It keeps an array of Pixel instances, where each Pixel contains an Opacity instance and 3 Color instances:

    struct Opacity { byte Value; }

    struct Color { byte Value; }

    struct Pixel { Opacity Opacity; Color Red, Green, Blue; }

    Since those are all value types, the image from my 12 megapixel camera will take 12M * 4, or 48MB. Each pixel will be stored adjacent to its neighbor in memory and the values are easy to access.

    If those were reference types, creating the array of 12M of them would allocate 48MB of memory just for references to Pixels (96MB on a 64-bit machine). Then you would have to loop through all 12M of the array elements to create 12M Pixel instances, each with at least 20 bytes (4 references plus overhead, on a 32-bit machine), for an additional 240MB. Of course for each pixel, you need to create 3 Colors and an Opacity, each being at least 8 bytes, adding 384MB more. So now your pixel array takes up 670MB instead of 48MB, the pixels are nowhere near each other in RAM (causing lots of page faults and cache misses), and you have to follow two references to get each value.

    Now do you see why we have value types?

  45. Praveen says:

    very good post.

    n thanx for awareness about it.

  46. Marius Horak says:

    I just wonder how important this (stack or no stack) could be for normal development?

  47. thunker says:

    what's with everyone's urge to be a 'lint' these days?  hey folks, 'stead of all the nit-picky comments why don't you go write your own abso-perfecto blog that no one will read?  or spend your days sending mail letters to the editors of your favourite newspapers correcting all the typos you find in the comic strips (or worse yet, in the obituaries)?  Jeez, get a life – read, absorb and move on, not ruminate, burp and regurgitate.

  48. Daniel Earwicker says:

    Excellent post and thread!

    @Marius Horak – "I just wonder how important this (stack or no stack) could be for normal development?"

    I thought it would be very important from a "performance" perspective – when I first looked at Java I was HORRIFIED ('Home Alone' pose) that I couldn't create user-defined types that would go on the stack, and felt C#'s struct feature was a major advantage.

    But I was totally wrong. The GC is so damn fast, it doesn't seem to make a lot of difference in real programs. I was being prejudiced by experience with C++, where the heap is much slower than the stack in all extant implementations.

    I think that prejudice is so strong, it infects a lot of discussion, "expert" opinion, training material etc. on the very subject discussed here, and so people obsess over a technical detail that is irrelevant 99% of the time.

    @Nick Aceves – "… Value Types represent something semantically very different than Reference Types. Value Types do not have Identity. Reference Types do."

    True, but the most basic and widely used "logical value type" in any platform is probably the string. Yet in the CLR platform, System.String is a class, not a struct.

    It uses the rich capabilities of classes to make every effort to ensure that the identity of its objects are irrelevant. The == operator is overridden to compare the string content, and strings are strictly immutable so we aren't going to suffer from accidental aliasing bugs. It simply doesn't matter how many names refer to the same string. It's a "value" in every way except it doesn't technically base itself on the value type facility in the CLR. It's not perfect – you can detect the identify using ReferenceEquals, or by casting strings to object: (object)s1 == (object)s2. But it works great most of the time.

    It's the same with Tuple in CLR 4 – and that even has a fixed storage size, so it technically could have been a struct. But it turned out that in typical real programs, all the copying was slower than using the GC to clear up garbage.

    The fact that things like String and Tuple are classes, not structs, is a big clue that structs are not a general basis for defining types that lack identity and represent "pure values". Structs don't even define == as a memberwise comparison by default. They seem to be most suited to a few interop scenarios. They're really a technical niche thing I think.

  49. Michael C says:

    but if I give the complete answer, there won't be time to ask any other questions in the interview, man.

  50. Ramesh says:

    The myth is proved the belief that people had till day! – nice article.

    Like old say "Look and make sure what you see, and dont believe on rumors"

  51. Varun Gupta says:

    Awesome read! Thanks for sharing the thoughts

  52. Alex says:

    Eric, thanks for the interesting information 🙂

    Common question on the interviews about where are value types stored now becomes very interesting. 🙂

    Could you also introduce printer-friendly version of your blog?

  53. Krepenzis says:

    Very useful and detailed analysis – thanks the author for that!

    Although I wouldn't be agree with the preceding comment that stands the change to "required by both M and N" is "just as correct": the actual scope of N concern includes allocation of its own automatic variables, referential types etc. that is obviously wider than the passed parameters and return value.

    Thus the original statement makes more sense for me…

  54. Dmitry Lobanov says:

    So, just to clarify, when we allocate array of value types, it is allocated on heap, ok? But what does this array contain? Does it contain references to boxed values? Or does it contain value type values?

    Work it out from first principles. What is an array? An array is a collection of variables of a particular type, called the element type. What do we know about variables of value type? A variable of value type contains the value. (Unlike a variable of reference type, which contains a reference to the value.)

    Therefore there is no boxing; why would there be? An array of ints is a collection of variables of type int, not a collection of variables of type object.

    You seem to still be reasoning from the fallacy that "anything on the heap is always an object". That is completely false. What is true is that variables are storage locations, and that storage locations can be on the stack or the heap, depending on their known lifetimes. – Eric

    If it contains value type values, how does runtime know what type of values resides in the array?

    Well, how does the runtime know that a field of a class is of type int? A class (or struct) is a collection of variables (called fields); an array is a collection of variables (called elements). The runtime can somehow get type information from the object about what the type of one of its variables is. How it does so is an implementation detail. – Eric

  55. Shuggy says:


    The reason is because any reference to an element of that array must come from either:

    int[] ints = new[1000000];

    // compile time known

    // the compiler knows the types and anyway

    var x = ints[20];

    // esoteric: pointers in unsafe context. again compiler knows the type

    fixed (int* p = ints[20]&) {}

    // runtime known

    Array a = ints;

    object o = ints.GetValue(20); // here is the runtime checking

    int i = (int)o; // unboxing occurs, you must use int and not, say, long

    in that last example Array.GetValue requires an object return type but since it is an int array just 'grabbing' the value at offset IntPtr.Size from the start of the array's data section won't work.

    Instead it uses TypedReferences (which are basically two pointers, one to the value, one to the type it is) and the CLR supplies a function on Array to ensure that it gets the right pointer based on the type of the array, which is known because an array is an object, and just like all other objects it has a record in it's object header that contains a pointer to it's type (used for reflection, vtables and the like). This function is (as of 2.0)


    private extern unsafe void InternalGetReference(void* elemRef, int rank, int* pIndices);

    will have backing code which does the runtime type checking.

    The second function involved is on TypedReference:


    internal static extern unsafe object InternalToObject(void* value);

    This will do the job of boxing the resulting value as the right type (in this case a boxed int) rather than just passing it along as a reference if the array was, say string[].

    Therefore you can always get the appropriate type based on the array itself.

  56. papadu says:

    Thanks for an excellent article and excellent comments, including the debates. I know that some people get turned off by the criticisms in the comments, but honestly, it helps me to learn more about this to read a really well-thought out debate.

  57. Allen Pestaluky says:

    Great perspective, and a very helpful explanation — But I thought it would be good to share the message that I take away from reading this article:

    From a game programmer's point of view, focusing on creating highest performance code, I should not use C# to produce the most efficient code because I have no way of telling where my data is being stored and what impact it will have on garbage collection performance. Instead, I should simply stick with a non-managed language to ensure predictable performance derived from known memory management.

    Haha, so that statement was quite extreme: I'm mostly saying this because I would love to hear these type of perspectives, like what was shown in this article, balanced with a bit more detail on the performance impacts of taking this point of view of the language.

    In reality, if I was programming for the Xbox 360, I would actually study the specific implementation details in order to create code that would be "performance friendly" for that specific platform, even though this might be against the "nature" and goals of the managed C# language.

  58. Aaron says:

    I agree with Allen. It's all good and well to wish to be ignorant to the system's implementation, but anyone who has developed for the Xbox with XNA knows that it requires specific tuning. The fact that its garbage collector sucks causes all sorts of grief. Can we get some more explicit examples of reference and value types in relation to high performance (read: real-time systems) code, perhaps including talk of value types alongside the 'ref' parameter modifier?

  59. Codingsense says:

    Excellent post Eric, can you please shed some light on static variables, static methods, static classes storage.

  60. Keith Robertson says:

    There's one major difference between the semantics of ValueType and reference type instances which your article is missing: that (aside from ref/out arguments) when a value is moved from one expression to another, the value is copied. Thus manipulations of a ValueType instance through different expressions will not be visible to each other.

    Consider a class Foo with field int bar, and with a getter and setter of bar.  Foo a = new Foo(); Foo b = a; a.SetBar(5);  Console.Write(b.GetBar());  The setting of bar through a will be visible through b if and only if Foo is NOT a ValueType. This is an essential distinction, if not THE essential distinction of ValueTypes.

    Imagine implementing C# on a platform with no concept of ValueTypes (e.g. on a Java or Smalltalk VM). You could do it easily, but (aside from ref/out arguments), every assignment or return of a ValueType has a shallow-copy semantic, e.g. Foo a = b.MemberwiseClone();

  61. Mark. says:

    Off topic, but I find these things "in the Microsoft implementation of C# on the desktop CLR, value types are stored on the stack when the value is a local variable or temporary that is not a closed-over local variable of a lambda or anonymous method, and the method body is not an iterator block, and the jitter chooses to not enregister the value", much easier to understand if I imagine Anders saying them.

  62. Dennis Lange says:

    Isn't the title a bit melodramatic… Sounds like a 1950's black and white

  63. Dennis Lange says:

    Isn't the title a bit melodramatic… Sounds like a 1950's black and white

  64. Amrit Ranjan says:

    Very good post.

    I think that we should always use "Thread Stack" instead of "stack" in sentences. It will be helpful for the beginers.

  65. Grn says:

    Knowing I am working on an 8 way Xeon this discussion seems rather silly.  There are about 8000 registers, 256 memory indexers, 64 scoreboards, 128MB of various cache.  

    Per Intel 98% of things loaded in registers are never used (mostly predictive branching).

    This means the stack and heap are at about 400 place for every clock cycle (assuming you could acually keep the processors loaded).

    Hiding this mess from the user is a real fine idea.

  66. Chuck Jazdzewski says:

    Of course now you need to add async method to your list of weasle words.

  67. Dhananjay says:

    One (and only) of the best truths that I have ever enjoyed reading !!

  68. Gaurav Pandey says:

    Excellent article. Thank you!