References are not addresses

[NOTE: Based on some insightful comments I have updated this article to describe more clearly the relationships between references, pointers and addresses. Thanks to those who commented.]

I review a fair number of C# books; in all of them of course the author attempts to explain the difference between reference types and value types. Unfortunately, most of them do so by saying something like “a variable of reference type stores the address of the object“. I always object to this. The last time this happened the author asked me for a more detailed explanation of why I always object, which I shall share with you now:

We have the abstract concept of “a reference”. If I were to write about “Beethoven’s Ninth Symphony”, those two-dozen characters are not a 90-minute long symphonic masterwork with a large choral section. They’re a reference to that thing, not the thing itself. And this reference itself contains references — the word “Beethoven” is not a long-dead famously deaf Romantic Period composer, but it is a reference to one.

Similarly in programming languages we have the concept of “a reference” distinct from “the referent”.

The inventor of the C programming language, oddly enough, chose to not have the concept of references at all. Rather, Ritchie chose to have “pointers” be first-class entities in the language. A pointer in C is like a reference in that it refers to some data by tracking its location, but there are more smarts in a pointer; you can perform arithmetic on a pointer as if it were a number, you can take the difference between two pointers that are both in the interior of the same array and get a sensible result, and so on.

Pointers are strictly “more powerful” than references; anything you can do with references you can do with pointers, but not vice versa. I imagine that’s why there are no references in C — it’s a deliberately austere and powerful language.

The down side of pointers-instead-of-references is that pointers are hard for many novices to understand, and make it very very very easy to shoot yourself in the foot.

Pointers are typically implemented as addresses. An address is a number which is an offset into the “array of bytes” that is the entire virtual address space of the process (or, sometimes, an offset into some well-known portion of that address space — I’m thinking of “near” vs. “far” pointers in win16 programming. But for the purposes of this article let’s assume that an address is a byte offset into the whole address space.) Since addresses are just numbers you can easily perform pointer arithmetic with them.

Now consider C#, a language which has both references and pointers. There are some things you can only do with pointers, and we want to have a language that allows you to do those things (under carefully controlled conditions that call out that you are doing something that possibly breaks type safety, hence “unsafe”.)  But we also do not want to force anyone to have to understand pointers in order to do programming with references.

We also want to avoid some of the optimization nightmares that languages with pointers have. Languages with heavy use of pointers have a hard time doing garbage collection, optimizations, and so on, because it is infeasible to guarantee that no one has an interior pointer to an object, and therefore the object must remain alive and immobile.

For all these reasons we do not describe references as addresses in the specification. The spec just says that a variable of reference type “stores a reference” to an object, and leaves it completely vague as to how that might be implemented. Similarly, a pointer variable stores “the address” of an object, which again, is left pretty vague. Nowhere do we say that references are the same as addresses.

So, in C# a reference is some vague thing that lets you reference an object. You cannot do anything with a reference except dereference it, and compare it with another reference for equality. And in C# a pointer is identified as an address.

By contrast with a reference, you can do much more with a pointer that contains an address. Addresses can be manipulated mathematically; you can subtract one from another, you can add integers to them, and so on. Their legal operations indicate that they are “fancy numbers” that index into the “array” that is the virtual address space of the process.

Now, behind the scenes, the CLR actually does implement managed object references as addresses to objects owned by the garbage collector, but that is an implementation detail. There’s no reason why it has to do that other than efficiency and flexibility. C# references could be implemented by opaque handles that are meaningful only to the garbage collector, which, frankly, is how I prefer to think of them. That the “handle” happens to actually be an address at runtime is an implementation detail which I should neither know about nor rely upon. (Which is the whole point of encapsulation; the client doesn’t have to know.)

I therefore have three reasons why authors should not explain that “references are addresses”.

1) It’s close to a lie. References cannot be treated as addresses by the user, and in fact, they do not necessarily contain an address in the implementation. (Though our implementation happens to do so.)

2) It’s an explanation that explains nothing to novice programmers. Novice programmers probably do not know that an “address” is an offset into the array of bytes that is all process memory. To understand what an “address” is with any kind of depth, the novice programmer already has to understand pointer types and addresses — basically, they have to understand the memory model of many implementations of C. This is one of those “it’s clear only if it’s already known” situations that are so common in books for beginners.

3) If these novices eventually learn about pointer types in C#, their confused understanding of references will probably make it harder, not easier, to understand how pointers work in C#. The novice could sensibly reason “If a reference is an address and a pointer is an address, then I should be able to cast any reference to a pointer in unsafe code, right?”  But you cannot.

If you think of a reference is actually being an opaque GC handle then it becomes clear that to find the address associated with the handle you have to somehow “fix” the object. You have to tell the GC “until further notice, the object with this handle must not be moved in memory, because someone might have an interior pointer to it”. (There are various ways to do that which are beyond the scope of this screed.)

Basically what I’m getting at here is that an understanding of the meaning of “addresses” in any language requires a moderately deep understanding of the memory model of that language. If an author does not provide an explanation of the memory model of either C or C#, then explaining references in terms of addresses becomes an exercise in question begging. It raises more questions than it answers.

This is one of those situations where the author has the hard call of deciding whether an inaccurate oversimplification serves the larger pedagogic goal better than an accurate digression or a vague hand-wave.

In the counterfactual world where I am writing a beginner C# book, I would personally opt for the vague hand-wave.  If I said anything at all I would say something like “a reference is actually implemented as a small chunk of data which contains information used by the CLR to determine precisely which object is being referred to by the reference”. That’s both vague and accurate without implying more than is wise.

Comments (76)

  1. Adrian says:

    Nice explanation.  The phrase "objects owned by the garbage collector" caught my eye.  Can you recommend/suggest any resources that dig into this concept of managed objects being under the ownership of the GC?  I had always thought of the GC knowing about or managing objects rather than making the mental shift to it owning the objects as such.  This probably shows my lack of understanding about the GC.

  2. C. Watford says:


    MSDN magazine had a series on the GC which was pretty good:

  3. I actually disagree. "Reference is an address" is a simple but powerful mental model.

    It is sufficient to explain semantics of assignment, field modification, method argument passing, etc. Without it, the novice has to memorize a bunch of weird rules.

    Sure, the address is not static, and it may not even be implemented as such. By the time the user cares, they will be able to grok it.

  4. I think the use of the term address is reasonable.  It’s not a perfect analogy, but I’d compare it more to an street address or an IP address rather than a pointer.  Really though, there’s hardly any semantic distinction: both are intrinsically meaningless short bits of identifying information whose sole purpose is to find other information.

    It’s not a coincidence that references are implemented as pointers; since they both serve almost the same purpose on the same architecture, it’s natural they’ll be implemented almost identically.  The implementation of references as pointers isn’t an implementation detail, it’s inherent in what references are.

    The languages tries to protect you from inadvertent mistakes via the fixed statement, but it won’t prevent a fixed pointer from being (unwisely) used outside of the fixed scope; and direct usage of the GCHandle type makes this kind of conversion even less obvious.

    In terms of pedagogy, I think the value of the inaccurate oversimplification is that it is indeed a simplification.  It’s hard enough as is to learn new things, but the larger the number of new concepts, the harder it becomes.  If you need to explain both references and pointers, then treating the commonalities first might be easier to grasp than focusing on the distinctions.  Then again, you probably don’t need to explain _both_ to a real novice anyhow.

    I suspect most readers of (beginner) programming books aren’t first time programmers, but people that have seen similar constructs in other languages.  If that’s you’re readership, the distinction might be a useful detail.

    To a complete beginner, an address, a pointer, a reference, an identifying description are all going to appear to be very similar.  A vague hand wave might be confusing, an oversimplification might be misleading; there’s no free lunch.  But when it comes to C#, which doesn’t allow pointer arithmetic on references (at least not with the unsafe and fixed keywords), wouldn’t those pointer-oriented keywords be a better place to clarify the distinction?  Confusing references and pointers seems harmless enough…

  5. Karellen says:

    Well, would it help to point out to these authors out section 25 of the C# spec (unsafe code) is a conditional part of the standard, that conforming implementations are not required to implement unsafe code, pointers, or any of the other paraphernalia surrounding it? From this it naturally follows that MS’s particular implementation and CLR are certainly not the only targets to consider.

    Further, it should be possible to compile/run C# for target environments where pointers simply do not exist and references *are* opaque first-class objects. Example environments include the Java VM, lisp machines, Parrot, or even a javascript engine with a C# compiler equivalent to Google’s GWT Java-to-Javascript compiler.[0]

    I find it somewhat scary that there are people writing books on C# who do not understand this.


  6. Weeble says:

    It’s all a bit of a mine-field.

    Depending on your background, you might assume "pointer" to mean the C concept of a numerical offset from the beginning of an address-space, or you might assume it to be an opaque token like in Pascal. (Although I think most popular Pascals gave in and allowed pointer arithmetic.) You might understand "reference" to be an opaque token as in .net, or "something that’s maybe sort of like a dereferenced pointer" as in C++. "Address" might have a precise technical meaning or simply mean "a piece of information that can be used to unambiguously find a thing".

    You’re definitely right that you can’t explain references in terms of addresses without first explaining quite what you mean by addresses!

    I often wonder how I’d do at learning programming from scratch in a modern language like C#. Although I now have a good understanding of reference and value types, I feel that I got there by first understanding value types and pointers (without pointer arithmetic) in languages like Pascal, and then understanding reference types by analogy to pointers. To me reference semantics seem quite advanced to learn straight away, and I wonder how they are best explained to those who have never learned another programming language.

  7. configurator says:

    “Pointers are strictly “more powerful” than references; anything you can do with references you can do with pointers, but not vice versa. I imagine that’s why there are no references in C — it’s a deliberately austere and powerful language.”

    Can you give an example for something that can be done with pointer and not with references, except indexing into arrays?

    Sure. The obvious example is “pointers to pointers”. In C# you can have a variable which contains a reference to an object — one level of indirection. And you can pass the variable as an argument of a method that has a ref parameter, so that’s two levels of indirection. But with pointers you can get arbitrarily deep levels of indirection; you can have a pointer to a pointer to a pointer to a pointer to an int if you want.

    Another example is being able to compare two references for reference equality. In C# you can take two references and use System.Object.ReferenceEquals to test to see if they refer to the same object. But consider the following:

    void M1(ref int x, ref int y) { … }
    void M2(int* x, int* y) { … }

    int abc = 123;
    M1(ref abc, ref abc);
    M2(&abc, &abc);

    In M1 there is no code that you can write in C# that can tell you whether you are in this case or not — you have no way of knowing if x and y are refs to the same variable. In M2 you can just compare the pointers for equality and you’ll know.

    – Eric

  8. Pavel Minaev says:

    > A pointer in C is little more than a number which refers to a specific index into an array the size of all memory available.

    Since we’re nitpicking here already, I have to point out that this is incorrect – in ISO C & C++, a pointer is not an index into “an array the size of all memory available” – in fact, intimately tied to the object, or an array of objects, from which it was produced, and any use of it outside that scope usually leads to U.B., with a few exceptions. For example, you can’t take addresses of two locals and compare them with < or > operators, nor can you subtract one from another – both are U.B. according to the spec. Technically, it is entirely valid for a compliant C implementation to implement pointers as “fat” objects that store some handle to the object (or array) from which the pointer was created, and the index within; and complain loudly about any operations that are U.B. The fact that pointers are plain memory addresses in most C implementations out there is also strictly an implementation detail.

    Excellent point. (Though an interesting thing, while we’re nitpicking, is that saying “you cannot do this because it is undefined behaviour” is logically inconsistent. If you cannot do it then that’s because the compiler is stopping you from doing it. Undefined behaviours only happen when you can do something that leads to the undefined behaviour. Really I think you meant to make the moral statement “you should not because it is undefined behaviour“.) But I fully agree with your larger point. There is a difference between the memory model handed to you by most implementations of C and the memory model defined by the language spec.

  9. Jon Skeet says:

    I entirely agree with the main premise of the post. My problem is that the example I find easiest to explain the difference between objects and references involves the word "address". I like to use the example of a house – you can write its address on a piece of paper, then copy it and give it to someone else, and the house itself isn’t copied. If you give someone the address of your house and they go and move the furniture, when you get home you can see the furniture has been moved too. It works in various ways.

    It’s not surprising that this is the case, because "address" means more than "location in memory" – the computing term was chosen to mirror the real-world term, rather than vice versa. We use "address" for many things: email address, web address etc.

    I’m still looking for an equally good example which doesn’t use the word "address".

  10. Pop Catalin says:

    There are so many explanations and definitions to references, pointers and handles that have come out over the years, but there seems to be no general consensus of  what are the correct definitions and comprehensive definitions.

    References have been defined as being  "addresses". Pointers have been defined as being "volatile references to memory locations", references have been defined as being constrained pointers, handles as references to resources or opaque pointers, now references are defined as being opaque handles. Pointers have been defined as being indexes into the memory array, arrays have been defined as higher abstractions over pointers. There are clearly some circles here … after all, all these at the base just are numbers of well defined lengths in binary format. It matters how you treat those numbers as values or as indirections to values.

    To a newcomer "a reference is an address to an object" is more clear than "an opaque handle", what is a handle to a newcomer? Does he know what that is? does he know what opaque really means in the context of memory management or object management ?

    To a experienced programmer a "opaque handle" is a better definition, because it says allot more about the behaviors involved, but to a newcomer explaining references as handles is very confusing, you can’t reference an abstract concept (handle) to explain a reference :).

    … my 2 cents.

  11. Jon Skeet says:

    Rereading the post, the first paragraph struck me – I think I skimmed over it on my first reading. I wondered how I’d escaped this chastisement when Eric reviewed C# in Depth. The obvious chapter which would have contained the problem is chapter 2, where I go over a few fundamentals. I’ve looked over that chapter again just now, and indeed I don’t claim that a reference is an address. Unfortunately, I don’t actually *define* a reference at all – I just give examples and analogies.

    Ho hum. Sins of omission instead of commission? I guess I could get away with it for C# in Depth. If I ever write a beginners’ book, it’ll be a different matter…

  12. mio says:

    Perhaps, at least once in a lifetime, it is useful not to shoe-horn the entire managed mentality into what something is or is not. Same applies to pushing managed idioms as the only correct solution to world hunger (while it leaks so much memory for few TextBoxes, Buttons and Tabs especially… )

    It is a well known fact for few decades that references are more optimisable btw.

  13. Larry Lard says:

    > I’m still looking for an equally good example which doesn’t use the word "address".

    Dogs and dog leads. Many people can have a lead to the same dog. If one of those people yanks the lead, the same dog moves for everyone.

    Dogs can hold (in their mouths?) the leads of other dogs.

    Garbage collecting is the stray dog van!

    This needs work I know :)

  14. Very simple and clear explanation of differences between references and pointers from the classical OO point of view.

    Yet, the problem is that OOP completely hides the mechanism of references and object management while the notion of address/pointer does not exist at all. In this sense, it does not matter if we refer to an object representative as a reference, a pointer, an address, a surrogate or a handle – the only functions they guarantee is providing access to the represented object.

    In wider scope, all the above terms are used in different context to emphasize some special feature, function of use pattern. But theoretically, there are only two types of identifiers: 1) an object representative like reference, surrogate or handle, and 2) an field reprsentative like offset. The main source of confusion is that frequently one type can be used for both purposes. Say, a name can represent either an object or a field. In the context of this post, a pointer can be used to represent an object or an offset to a field.

    Unfortunately, this difference is not emphasized in OOP just because it is not needed — we have only primitive references and primitive fields. (Primitive means that they are generated and processed by the compiler, interpreter or run-time environment only.) Concept-oriented programming (CoP) is an emerging technology [1,2,3] which tries to fix this (and other) restrictions of OOP by making references first-class citizens of the object world. In particular, a new programming construct, called <a href="">concept</a&gt;, is used instead of classes and it is defined as a couple of two classes: one reference class (for describing the structure and functions of references) and one object class.

    [1] <a href="">Informal Introduction into the Concept-Oriented Programming</a>

    [2] <a href="">Concept-Oriented Programming wiki article</a>

    [3] <a href="">Concept-Oriented Programming Blog</a>

  15. Filini says:

    @Jon Skeet: the House sample is perfect, because the address is not the only way to get a reference to it. You could have standard coordinates (latitude, longitude) or custom coordinates (common start point, direction, distance).

    Address, standard and custom coordinates are three valid implementations for referencing a house.

  16. Pop Catalin says:


    a address is not a way to get a reference to the house, copying the address or asking for a address is a way to get a reference to the house. The address is the reference.

    A address is the information that allows you to get to the house, through a well known process (following the address) in programming terms, dereferencing the reference).

    A address is a "piece of information" that when processed in a well known way, allows you to access the object it refers to.

    For example

    var person = GetSomePerson(..) // Aquire a address of the person we are interested in

    var age = person.GetAge();

    // there are two operations here

    1) "person." , follow the address of person to reach the person through a well know way (if the address is a phone number, call him, it it’s and email email him, if it’s a house address go there)

    "person." ->  "the real person" (dereference the reference)

    2). GetAge() call, is performed on "the the real person" not on it’s reference.

    Let’s say you are a company CEO called C#, you say to one of your employee, "here’s the address of John get me his age". (your employee know it has to follow the address, implicit dereferencing)

    If you were an company CEO called C, and you would say to an employee "here’s the address of John get me his age". The employee would answer the address doesn’t have an age.  Then you would have to say, follow the address and get me the age of the person you found at the address. (pointers need explicit dereferencing).

    In C# dereferencing of references is implicit and hidden behind the member access operator ".", when used after a  variable name of a variable that stores a reference.

  17. Mark Rendle says:

    For developers coming into C# fresh, or from scripting or another managed language, "address" doesn’t have the meaning or the implications that it does for a C programmer. Address when understood in the sense of postal, email or web is a reasonable analogy for a reference in a managed runtime. To reach the widest audience, and keep things simple for novices, I think the original statement is fine, but should be accompanied by a warning:

    "a variable of reference type stores the address of the object (C programmers take note: not *literally* the address in memory; references are not pointers)."

  18. Filini says:

    @Pop, maybe I wasn’t clear enough.

    My point was that a "reference" is a way to find/handle the object (correct me if I’m wrong), and the address is not the only way to find the house (Jon asked for a way to explain it without "address").

    There are other possible implementations.

    I can write "the house at coordinates 150,95" on the paper instead of "the house at 27, Evergreen Terrace, Greendale".

    Different implementations of reference:

    - Address: based on an index (the streets registry of the Unites States, or whatever)

    - Coordinates: double index on the whole world map

    - Polar Coordinates: algorithm based on a common start point

  19. Jonathan says:

    Ok, I understand the post, but if a reference doesn’t hold an address, what does it hold? I understand the abstract definition but when i pass a reference what the hell am i  passing?

    You’re passing enough information to allow the callee to find the referenced thing. What that information is, you don’t need to care about.

    Look at it this way — ultimately, the reference is going to somehow locate an address. But why stop there in your quest to dig into an abstraction? That address is an address in a virtual memory space. Somewhere in the operating system there is a map between the address in the virtual memory space and the location of the information in physical RAM and/or the swap file on disk.

    Do you care about that level of abstraction? Probably not — the operating system manages that for you so seamlessly that most of the time you can just ignore it.

    My point is that “a reference” is just a magic token that lets the callee find the information. It happens to be an address. What’s an address? It’s a magic token that lets the operating system find the information. For most applications you don’t need to understand how the magic works, so don’t stress about it. — Eric

  20. Aaron G says:

    If I had to explain this to a beginner, I would probably make the distinction that an address is a means of *locating* something, whereas a reference is merely a means of *identifying* it.  Addresses tell you *where*, references only tell you *what*.

    It’s the difference between saying "get me the Johnson file" and "get me the 6th file in the 3rd drawer from the bottom".  Both are technically references but the second assumes much more and is more prone to error.

    Oh, and to answer the question about "if a reference doesn’t hold an address, what does it hold?"  I can think of at least three possibilities:

    - A CPU register

    - A unique key (i.e. hash code) in some dictionary

    - A function that retrieves the value (this is technically still an address, but not the address you would expect)

    It’s best not to worry about it – that’s why the compiler does it for you.

  21. Silverhalide says:

    I think Jon Skeet’s first post is a good start: A house has an address, a piece of paper with "go to 123 Main St." on it is a pointer to the house.

    A reference to the house would be a piece of paper with "go to Silverhalide’s house" on it.

    Ultimately, they are both talking about the same house. One is direct, allows things like "go to the house 3 doors down from 123 Main St.", and is fragile: if I move, "123 Main St." becomes useless for knowning anything about me. The other is indirect, and "Silverhalide’s house" is still good when I move from 123 Main St. to 321 Water St.

    If pointers are "indexes into an array of memory", then I think of references as indexes into an array of (indexes into an array of memory). This allows us to change the actual memory location of the referenced object around without needing to change the reference, it allow us simplify GC since all use of the object goes through the handle (unlike pointers which can randomly reference any memory)

    References also tend to know more about the referenced object then C-style pointers do. C-style pointers can point to any piece of memory and treat it as any type of object regardless of what lies underneath (ignoring undefined behaviour, etc.). Pointers tend not to contain any information other than the address. References may actually know the reference count or other "meta" information about the object.

  22. ShuggyCoUk says:

    References are *directions* to a thing.

    That an acceptable implementation of directions (the most obvious and simplest) is the absolute position relative to a well defined origin does not mean that you should assume that that’s how it works.

    In a GC environment the directions idea is even better because things move around in unpredictable ways but so long as you have a reference the GC promises it will keep the directions up to date all the time.

    If for some reason, say to talk to an older program that doesn’t know how to read anything but the absolute location sort of directions, you can ask the GC to give you a copy of those absolute directions which is a pointer. However older programs also don’t expect directions to change, and the GC probably couldn’t change the pointer where it is going if it tried. So to be helpful (who likes getting lost in seg fault land) it can promise not to move the thing for a while, since you normally wish to do both at the same time fixed exists.

  23. configurator says:

    Regarding your examples in the response to my comment: the first one ("you can have a pointer to a pointer to a pointer to a pointer to an int if you want") is not an example of something you can do with pointers and can’t do with reference – it’s an example of a syntax (int****) that doesn’t exist with references – but what can you do with that syntax? As for the second example – I never thought of that! Are there any more limitations in reference types? (There are a lot with value types – you can’t keep two references to the same int as fields, for example)

    Regarding your response to Jonathan: There are more abstraction layers in between, right? I’d like to know the low-level details; what is actually kept in the reference. If I thought about it in a C# world (if the CLR was implemented in C#) it’d probably be a reference type with a pointer inside it, and the GC would change the pointer when relocating data (thus changing the pointer for everyone). But what is actually kept there? A pointer to a pointer? An index to a really big "All reference types data" array? An imp that tells the runtime where to look?

  24. mike says:

    Respectfully, Eric, I have to point out that imo, anyway, you’re not necessarily going to be able to put yourself into the shoes of a novice programmer, or more precisely, not necessarily going to be able to accurately predict what novices will and won’t understand.

    As a guy who is way, way closer to that particular demographic :-) I can tell what might be useful for me. (Of course, I represent a set of 1 w/r/t people’s learning styles.) I am a mild advocate of the approach of using analogy even if it isn’t perfect. My justification is two-fold: true novices need something to grasp onto that gives them a concrete model for what the hell is going on; people (well, I) accrue knowledge by extending, comparing, and/or contrasting what they (I) already know. I need a point at which the new knowledge maps to something, anything that gives some sort of comprehensible model. The model has to be sufficient only to the point of getting the novice (me) over the initial conceptual hurdle.

    The second part of the justifiaction is that once the novice has an overall model of what’s going on, it’s ok to explain that the initial model was not entirely accurate. (“Remember how I told you how references were like … ? I lied.”) Assuming that the original analogy was not just completely misleading, people can adjust their view, because they have a more sophisticated understanding of what’s going on. You just can’t explain MS-Word styles and templates to someone who’s still grappling with how to center a paragraph.

    Just to be clear, I have no problem with an analogy. “A reference is like the address of an apartment” is fine. But “a reference is an address” is defining jargon in terms of other jargon. — Eric

    I also wonder how much of this a novice programmer has to understand at all. A problem that I’ve seen with books that are aimed at novices is that they’re rarely aimed at novices. This is due to my very initial point, which is that someone who has sufficient understanding of (e.g.) C# to write a book about it is almost certainly not going to remember what it was like when all of this stuff was just a tangled mass of blurry concepts in their head. I, too, have reviewed books, and I virtually always have to point out to authors that they are assuming that the reader already understands things that the author has not in fact explained, that the author is making (sometimes huge) conceptual leaps between their initial explanations and their subsequent discussion, or that — my last point — they’re writing about stuff that’s just not necessary for the reader at that particular point.

    Anyway, thanks for listening. :-)

  25. Jeff A says:

    Aren’t we missing the really important point?  We should define references by the semantic differences between reference types and value types.

    C# has variables

    Variables have a type

    Types are classified as either references types or value types

    Variables support assignment

    Value type variables copy state on assignment

    Reference type variables share state on assignment

    This means mutating operations on value type variables affect only that variable while mutating operations on reference type variables affect all variables that share that reference.

    For example consider the different semantics of the following operations;

    Person x = new Person(28);

    Person y = x;


    int a = 28;

    b = a;


    Then extend the assignment rules to passing parameters and how ref/out parameters can further change the semantics of variables.

  26. Pavel Minaev says:

    > Value type variables copy state on assignment

    > Reference type variables share state on assignment

    > This means mutating operations on value type variables affect only that variable while mutating operations on reference type variables affect all variables that share that reference.

    I think this is a wrong way to go about it. When a variable is of a reference type, the value of that variable is the reference – so reference type variables also copy state on assignment, it’s just that the state happens to be the reference! With your approach, you’ll have to answer all sorts of inconvenient questions, such as "what state is copied from where to where when assigning a null value to a variable".

    In contrast, ref arguments can be described this way: there’s no "ref value" – a ref is an alias for a location, and that’s it. The reason is that, in C#, there are no operations on refs, so wherever you deal with them, you’re _always_ dealing with the aliased location, never with the "address". This is in contrast to reference types, for which you do deal with actual "references as values" – when comparing them, for example.

    This is really very similar to the difference between pointers and references in C++, which is probably why it’s so tempting for people with C++ background to describe them that way.

  27. Ianier Munoz says:

    I agree with Eamon’s comment above. There’s no need to put an equal sign between the terms "address" and "memory address". In fact, most C/C++ books explicitly say that a pointer variable stores a **memory** address.

  28. Joe Y. says:

    c++ already has the "reference" type, it’s just alias of some type, so I think this word in C# should have the same meaning, it’s just an alias,used  for convenience. In the c# language specification,the author never says a reference equals a pointer;

    pointers can do everything you needed to manipulate the memory,but pointers are also sources of bugs,so C# designed many features to free us from that

  29. Alex G. says:

    While I agree that a simple and shallow statement like "a variable of reference type stores the address of the object" is a really bad explanation, I strongly disagree with the seemingly implied notion that an author shouldn’t use the word "address" at all when explaining the concepts of reference type and value type.

    As a learner, I would greatly appreciate an author’s good explanation of what "reference type" means, and the only way to give a good explanation is to explain a little about the memory model and use the word "address"! I disagree that in order for the reader to understand, he/she would need a *deep* understanding of the memory model. Yes, he/she would need some explanation of the memory model, but it doesn’t have to be that deep and complicated. It just has to be good.

    Personally, hand-waving would really really put me off and make me think very poorly of the author. It isn’t that extremely hard to explain, and if you (as a hypothetical author) felt pedagogically that it really would be too hard on the poor little reader, then fine; hand-wave at first. Then, after that, include a side panel or a "more in-depth look" box for those readers that are serious about learning. Hand-waving, to me, indicates a lazy author who just can’t be bothered to try.

    I’d like to present to you a good example of an explanation that uses the word "address." The following is an excerpt from Microsoft® Visual C#® 2008 Step by Step, by John Sharp, from page 145 of Chapter 8, Understanding Values and References.

    (Mr. Sharp has just finished giving an example of what happens in memory when you declare a variable as a value type.)

    Begin Quote:

    "Class types, such as Circle (described in Chapter 7), are handled differently. When you declare a Circle variable, the compiler does not generate code that allocates a block of memory big enough to hold Circle; all it does is allot a small piece of memory that can potentially hold the address of (or a reference to) another block of memory containing Circle. (An address specifies the location of an item in memory.) The memory for the actual Circle object is allocated only when the new keyword is used to create the object. A class is an example of a reference type. Reference types hold references to blocks of memory."

    End Quote.

    After this quote, Mr. Sharp gives an example of code demonstrating the differences in copying a value type variable versus copying a reference type variable, and he includes a visual diagram to help explain.

    Now, that is a good, accurate explanation.

    Here are several counter-points to your argument that I’d like to present, using Mr. Sharp’s explanation as the example:

    1. It’s not a lie. Mr. Sharp uses the term "address" to explain, but at no point does he impart upon the reader that references can be treated as addresses by the user. The reader doesn’t come out suddenly thinking "wow, now I can manipulate references numerically! Let’s try adding and subtracting from them!"

    2. It is a very good explanation that imparts a great deal of understanding to novice programmers without being overly complicated or overwhelming. Notice how Mr. Sharp gives a clear, simple understanding of the memory model without getting into complicated details. At no point does he use the words "stack" or "heap," which meant he didn’t need to get into the nitty gritty details of what a stack or heap are or how they work.

    3. Novice C# programmers have to get a good understanding of references anyway if they want to leave the ranks of the novice. By the time they get to learning about pointers, they wouldn’t be as novice as they once were. Having a good understanding of references would make it easier, not harder, to understand how pointers work. The learner could sensibly think, "Wow. That’s why we have references in C#; they are so much safer than pointers."

    In summary, if you take the time and care to thoughtfully explain, the reader would learn more and appreciate it more than having a vague hand-waving.

    Alright, that concludes my two cents.

  30. DRBlaise says:

    Alex G., I believe that the Mr. Sharp example illustrates Eric’s point exactly.  Look at the modified quote below that does not contain "address".

    "Class types, such as Circle (described in Chapter 7), are handled differently. When you declare a Circle variable, the compiler does not generate code that allocates a block of memory big enough to hold Circle; all it does is allot a small piece of memory that can potentially hold a reference to another block of memory containing Circle.  The memory for the actual Circle object is allocated only when the new keyword is used to create the object. A class is an example of a reference type. Reference types hold references to blocks of memory."

    Isn’t this more clear than the original?  The original adds confusion when it brings in the term "address" and tries to define it.  I personally am confused whether Mr. Sharp is trying to say "address" and "reference" are the same or whether he is expicitly trying to say they are different.  Keep it SIMPLE!

  31. Denis says:

    "In most modern cars, the accelerator pedal is just an input device of a computer that actually controls the engine (fuel injection); in a very similar way, a reference lets the programmer control the object whose lifecycle is actually managed by the runtime. A use of a pointer, in this context, would be tantamount to messing with the fuel valves directly, bypassing the computer." That’s how I would write my beginner’s book in the counterfactual world. :-)

    I like simple, vivid illustartions: I believe they are a viable alternative to both "inaccurate oversimplifications" and "accurate digressions", not to mention always good fun to read.

    Another example: here in Australia a police psychologist, when asked to explain, in plain English, the difference between a schizophrenic and an ordinary man with delusions, said that "men with delusions simply build castles in the sky; schizophrenics actually move in and live there." :-)

  32. Alex G. says:

    (In response to DRBlaise)

    Perhaps I am in the minority, but I truly feel that Mr. Sharp’s explanation is clearer without removing the word "address." In my humble opinion, using the word "reference" to explain the the term "reference" is more confusing. But, alas, perhaps I really am the only one who feels this way.

  33. Kevin Burton says:

    I run into this all the time.

    I want to do an operation on a sub-vector just as if it was the whole thing

    In C:

    foo(FooType *v,int length){ }

    FooType * v = malloc(n);
    foo(v + 10, n – 10)

    With a reference in C# there is no easy way to do this same thing. I end up passing the starting index to signify that the operation is to be on the starting index to the end instead of the whole array. So you have something like:

    foo(FooType[] v, int start) { }

    FooType[] v = new FooType[n];
    foo(v, 10)

    Not a whole lot of difference as coding goes but definitely not the same.

    Not the same, but both are pretty much equally ugly. Some food for thought: are there ways that you could make foo more general and thereby make the call site more attractive? For example:

    foo(IEnumerable<FooType> v) { }

    FooType[] v = new FooType[n];

    There is a performance and flexibility cost to treating an array as an IEnumerable — you don’t get the fast random access. But if you don’t need random access and this operation is not your bottleneck, then you gain the flexibility of being able to pass any sequence to foo, not just an array. — Eric

  34. Pavel Minaev says:

    > Not the same, but both are pretty much equally ugly.

    Speaking of which – do you know of any reason why System.ArraySegment<T> struct exists in its present shape? As it is, it seems to be mirroring the "T[] array, int offset, int length" pattern precisely, and it doesn’t even attempt to abstract it away, so there’s nothing gained from using it. If it at least had an implicit conversion operator from an array, it would be marginally handy to save on typing for the most common "pass the entire array" case; i.e.:

       Foo(ArraySegment<FooType> v) { }

       FooType[] v = new FooType[n];

       Foo(v); // auto-convert to 0..Length

    Of course, some language-level syntactic sugar to slice arrays a la Python would be even better… but I really wonder why, for the lack of all of the above, is ArraySegment even there? Does anyone even use it? Or is it just a trace of some long-abandoned experiment along the aforementioned lines?

  35. Pop Catalin says:

    "You’re passing enough information to allow the callee to find the referenced thing"

    Isn’t that a address? Can’t set of information used to find something referenced be called a "address", maybe in C land a address is usually a memory address, but I don’t know if that has to be true for C# land (or that it matters if it’s a memory address or not) …

  36. frankie says:

    I think the issue here is really just the english language, and pointer is very concrete while reference isn’t, and then you go on to explain that they’re ‘same same but different’ which is confusing.

    Pointer – it points to something.

    Reference – it refers to something, and ultimately it’s a pointer somewhere.

    Pointer is concrete. It points to a specific memory address (location in memory). You can shoot your foot off because you can set a pointer to anything and attempt to dereference it as anything else. You can load data from a file directly into memory and set a pointer to the start and index it like an array if you wish. The reality is your program is essentially one big array of bits and is free to interpret (or misinterpret) at will. Many bugs are associated with the incorrect interpretation of the contents of memory.

    For a reference though, it’s abstract because it’s very similar to a pointer, but we all have to pretend it’s not. We know it also points to something, and while our referenced object doesn’t change, the pointer held by the reference is free to change without our knowing.

    So I haven’t liked any of the metaphors so far. It’s not like an address written on a piece of paper because a street address never changes in the normal sense. In explaining things to beginners it behooves us to create sensible metaphors and not geeky ones and without complex contortions of the metaphor to make it work.

    I really dislike metaphors .. but … consider this.

    Given a city, there is a car. It’s your car – a Red Ford Mustang convertable with fresh pine scent and cool new wheels. There are millions of cars, and millions of Ford Mustangs, and each car can be moved around.

    Now, you’re not allowed to hold a pointer to the car because the car can be moved without your knowledge. e.g. If the car is towed or your brother borrows it he might not re-park it in the same place. Trying to hold a pointer will get you in trouble. You’re mad so myRedFord.Kick() could end up kicking someone else’s car, or you might kick empty air, or you might kick a building.

    But I do have a car, and I always know exactly which one it is when I go and get it.

    So if we want to know what a reference is, it’s the licence plate (registration) of the car. Any number of things can refer to the car via it’s registration token, and the registration token can be used to locate the car at a specific point in time, but the pointer itself is only useful for very short time slices.

    Knowledge of pointers is essential to understanding a computer. And I don’t think you need to handwave. You just need to say a reference is the thing (object) for all intents and purposes, and the reference probably contains the pointer but you never get to see the real pointer because it doesn’t matter to the program you write. As you say, it’s an implementation detail. The reference is all the book-keeping that goes on, and the reference is, in the end, not owned by the programmer, it’s owned by the memory manager.

  37. Thomas Goddard says:

    Timely, accurate, and unbiased.  Great article.

  38. Niranjan says:

    Perhaps the best explanation on references (IMO) can be found on Bruce Eckel’s "Thinking in C++" on the chapter devoted to references and copy constructors. I know that is C++ but the concept can be carried over to C# without many modifications. Here’s what Bruce Eckel says :


    References are like constant pointers that are automatically dereferenced by the compiler. The easiest way to think about a reference is as a fancy pointer. One advantage of this “pointer” is that you never have to wonder whether it’s been initialized (the compiler enforces it) and how to dereference it (the compiler does it).

    The point is that any reference must be tied to someone else’s piece of storage. When you access a reference, you’re accessing that storage.

    There are certain rules when using references:

      1. A reference must be initialized when it is created. (Pointers can be initialized at any time.)

      2. Once a reference is initialized to an object, it cannot be changed to refer to another object. (Pointers can be pointed to another object at any time.)

      3. You cannot have NULL references. You must always be able to assume that a reference is connected to a legitimate piece of storage.


    The first sentence pretty much nails it, a reference is like a constant pointer which means it can’t change to point to something else. Also, it doesn’t need explicit dereferencing like pointers do. Merely accessing the reference will allow us to access the referent it is tied to. Incrementing a reference is incrementing the referent itself, you can equate a reference to a label to the actual storage, the reference is the referent itself, a bit like an alias if you catch my drift (Handle is an equally acceptable word).

    Pointers on the other hand are addresses of something stored in memory. Note that pointers don’t point to the object in memory, it only contains the address in which the object is currently present in. Since it is an address (essentially a number), pointers lend themselves to arithmetic, which in turn leads to a number of interesting (ab)uses of the concept.

    I would prefer that most high level languages not even have a pointer, just a reference is fine. Since references allow us to handle memory in a way that pointers do without the extra power, it can be considered a fairly "safe" pointer. References might use addresses at the end of the day but that is an implementation detail that should not be of any significance to end users of the language. By making it very difficult (if not impossible) to get to the actual address (the raw number) of a variable using its reference, we could eliminate a whole class of problems that could ensue from users making assumptions about what is inside a reference. It is enough to say a reference is an opaque thing that points/refers to an object in memory and allows us to manipulate the object directly.

  39. OG says:

    For a novice OO programmer it suffices to say and it is TRUTH to say:

                 A pass by reference is passing the thing itself. (Original)

                 A pass by value is creating and passing a copy of the thing. (Copy)

    This started by talking about educating novice (C#) OO programmers. If we are going to educate OO programmers then let’s not confuse them with talk about physical memory and addresses.

    We old school guys who started by coding hex or octal, moving on to assembler, and eventually to compiled and interpreted languages may need to think of the world in that way, but they don’t and we should not corrupt them.

    C and C++ are baroque tools.  Much like assembler in that they are philosophically closely tied to an understanding of the hardware of the machine. These old school languages are powerful, expressive, and capable of astonishing efficiency in the right circumstances, but they also produce brittle, costly to support, and expensive solutions. This required understanding of the physical implementation details in these languages is in fact one of great drivers behind the advent of “modern” languages like C#.

    We need to raise our eyes up from the circuit board and focus more on modeling the world, that’s what OO is about.

    We don’t teach new surgeons how to smelt iron to make a scalpel, so why teach a novice OO programmer about memory locations. We need both metallurgist and surgeons but we don’t train them the same way.

  40. TheCPUWizard says:

    A good post, with some interesting comments. As I am in the process of posting some related material would it be OK to include a pointer to this post as a reference? <grin>

    My biggest complaint is the "We dont need to teach novices…(fill in the blank)". As a person with over 32 years of professional development experience (including 25 as the Chief Architect and CEO of my own company), I have reapeatedly found that programmers who do not understand the internals and fundamentals have severe problems that manifest in many ways.

    Moderns technologies mean that we do not have to really think about the lowest levels on a regular basis, but without a deep understanding of what happens at the actual hardware level, the quality of work does significantly suffer.

  41. Josh Jordan says:

    I applaud your ability to call out the vague idea of a reference, while still giving an explanation that is vague unto itself (and rightly so).

    Great article.

  42. jinksk says:

    " ‘a reference is actually implemented as a small chunk of data which contains information used by the CLR to determine precisely which object is being referred to by the reference’. That’s both vague and accurate without implying more than is wise…."

    Not to be factious, but that explanation is not only vague, it ranks with many similar phrases that I have seen make novices go sobbing into the night a throw said book, containing said “reference” right out their bedroom window.

    Try a gentler approach: such as “A reference merely refers to a chunk of data that can be retrieved on-demand by the developer.  How said data is stored and maintained is entirely up to the CLR and is beyond the scope of this discussion.” Then and only then would I footnote an external source for a more specific detailed discussion, specifically warning the novice that to go there might cause them to become pre-maturely grey and could possibly result in an encounter with the infamous “Stray Pointer Dragon”!

  43. Vitaly says:

    A reference in C# is a "full service" pointer. Just pull up to the gas station and the attendant gives you want you want.

  44. Kyle says:

    I notice you made no use of the word "abstraction" to describe references.

  45. Peter Wone says:

    Anything but a literal is reference. Even "5" is but a name descrying an individual artefact of sustained enumeration: in the beginning there was nothing, and there was exactly one nothing, which gave us the value "1" – and then there were two things, the primal nothing and the value "1", and then there were three things, the primal nothing, the value "1", and the value "2", and by induction we have an infinity of integers, ordination and, most importantly, tenure for professors of discrete mathematics.


    int foo = 5;

    int bar = foo  + 1;

    begins with the assignment of a literal to a variable which is a named REFERENCE to storage. In the expression foo + 1, the symbol foo is RESOLVED to the value stored in whatever the run-time chooses to store it in, the point here being that it’s a reference that must be resolved.

    Go with the hand-wave. By the time Grasshopper is ready to understand the answer, he won’t need to ask the question.

  46. Peter Wone says:

    Eric, when are you going to write "Fun with Pointers: 101 ways to make your intern’s head explode" ?

  47. Joe says:

    "a reference is actually implemented as a small chunk of data which contains information used by the CLR to determine precisely which object is being referred to by the reference"

    Yuck.  What useful information does this definition impart to the novice programmer?

    The fact is that OOP is an abstraction on top of many more abstractions, going all the way down to assembler, registers, the MMU, etc.  As with any abstraction, there are leaks, which *eventually* need to be understood to avoid certain problems or maximize efficiency.  But the value of the abstraction is in delaying the explanation of the gory details until after the fundamental concepts have been presented and understood.  Once you’ve reached that point, you at least have a context in which to talk about the leaks, and you can choose how deep you want to go, but it’s certainly not material for chapter one or two unless the title of your book is something like "Advanced C# Concepts for the Experienced Programmer."

    For someone just starting to learn about OOP, I would stick with diagrams and pictures.  Draw arrows pointing from variables to stamped out object instances, and from object instances to other object instances, circle them in red, and say "That’s a reference.  It lets you <i>refer</i> to object instances."  It might not be very technical or in-depth, but it effectively conveys the information that the reader needs to know at that particular point in time, without requiring them to understand 15 other concepts first.

  48. Jon Davis says:

    I haven’t these comments but I disagree with the original post.

    "References store the address of an object" means exactly what it says. It does not mean "references are pointers" or "references are addresses". You’re munging the two texts.

    It would be an "implementation detail", documentation-wise, to say (correctly) that "references internally store the address of an object, although that detail is hidden from you, it only does this as an implementation detail to map the reference to the memory space that the garbage collector has allocated for it". But those are too many words; it is 100% truthful to shorten that down to simply say "references store the address of an object".

    If indeed the text said "references are addresses to an object", a description that comes closer to describing a C# pointer, I’d agree with your point.

    Also, I believe that even if a reference working with the garbage collector to retain the address to the memory space of an object detail is only an "implementation detail", I think it’s a very important implementation detail to know. This sort of documentation usually comes up in the discussion of stacks vs. heaps, value types vs. reference types, and/or memory optimization in the CLR. It might not be all that interesting from a syntactical point of view (C# being a language), it is a very important thing to know from an architectural point of view (CLR being a runtime).

  49. Kirti says:

    While it is a subtle distinction, I have always defined a reference and a pointer as follows:

    Pointer: A type of variable whose value represents the memory address of the instance of an object or type.

    Reference: A type of variable that acts as a synonym to represent an instance of an object or type.

    They sound the same as you read them, but I think the distinction for me is where they are significant. A reference to me is the developer telling the compiler that whenever he/she refers to X, it should interpret that is meaning Y. It’s true that references are often implemented using memory address and often represent pointers, but it does not have to be. Perhaps though some clever use of Macros in languages like C, C++, you could implement something similar?

    I think it is hard to seperate the two concepts beacuse their implementations are almost always identical behind the scenes. But something that helped me understand that there was a difference is how they are treated with C++:

    A reference MUST be initialized to an object, while a pointer can be null. You can have a pointer exist that does not have a value (i.e. memory address). But you cannot have a reference exist that doesn’t refer to another instance – because conceptually, a reference does not have a value.

    On the other hand, a pointer is and *always* will be a variable that contains a memory address of an object or type. You can rely on any implementation of a pointer always containing a memory address.

  50. JeffB says:

    For beginner programmers, you need to stay away from big words like "memory".  I’ve seen way too many people who struggle with the difference between data in RAM and data on the hard drive – not that virtual memory helps.

    Your best bet will always be a physical metaphor rather than using any other programmer-ese words.  Unfortunately, words like "address" and "pointer" have already been co-opted by earlier languages and now have additional baggage meaning that no longer applies in C#.

    For Larry:

    A different real-world metaphor that might work is the dry-cleaner’s claim ticket.  Every object is like a binder notebook, with a piece of information on each page.

    Some notebooks are "value type" that you carry with you, but cannot be shared beyond making copies.

    Other notebooks are "reference type" and are always owned by the GC.  In order to use or modify the information stored in those objects, you have to have the claim ticket information written in your notebook.  You can then use that claim ticket to be allowed access to the referenced notebook, where you can copy or modify the values written there.  Since other notebooks can contain a copy of your claim ticket, owners of those notebooks can also have access to the referenced notebook.

    Fortunately, the C# compiler hides all that extra work needed to use a referenced variable, so you don’t have to understand it before you can use it.

    Unfortunately, the C# compiler hides all that extra work needed to use a referenced variable, so you can easily forget the distinction.  Why is there a need for value types, other than optimization?

  51. AJ says:

    >>>For beginner programmers, you need to stay away from big words like "memory".  I’ve seen way too many people who struggle with the difference between data in RAM and data on the hard drive – not that virtual memory helps.

    If someone struggles with the difference between data in RAM and data on hard drive then probably he is not a programmer (by any definition).

    I found the the article entertaining and author was able to clearly able to make distinctions between pointer and references. But for the oversimiplication of address is bettwe when it comes to understand how the references can be used in the programs. "Reference is address to the object" is simple enoguh to understand. To make it clear it can be added "But unlike pointer it doesn’t allow you to do adress manipulation". OR how about "reference is its Readonly Address?"

  52. AJ says:

    Sorry for repeat posting…something didn’t work properly last time….

    >>>For beginner programmers, you need to stay away from big words like "memory".  I’ve seen way too many people who struggle with the difference between data in RAM and data on the hard drive – not that virtual memory helps.

    If someone struggles with the difference between data in RAM and data on hard drive then probably he/she is not a programmer (by any definition).

    I found the the article entertaining and author was clearly able to make distinctions between pointer and references. But for me the oversimiplication of address is better when it comes to understand how  references can be used in the programs. "Reference is address to the object" is simple enoguh to understand. To make it clear it can be added "But unlike pointer it doesn’t allow adress manipulation". OR how about "reference is **Readonly** Address of an object?"

  53. Ian W says:

    You can understand how the confusion comes around though. Checkout the .NET 2.0 Foundation MCTS training kit which is obviously created by or endorsed by Microsoft (has it’s logo all over it) and under a big title:

    What Is a Reference Type: Reference types store the address of their data, also known as a pointer, on the stack.

    If we’re taught this in the beginning by the company who develops the language, you can see why the confusion comes around!

  54. David W says:

    This is a very interesting article, although I must respectfully disagree with the author on whether novice programmers should hear the word "address" or "memory" without knowing something about the memory model of the language.

    There are few concepts in programming more fundamental than a variable, and I can’t think of a language in which a variable is anything but some place in memory holding a value. I also can’t think of a language I *must* know in order to understand that concept, whether its in writing a chunk of VBA to automate Excel or delclaring an integer in ol’ F77. if you aspire to program, but can’t be expected to understand the concept of a variable representing an address, one might want to consider a different profession.

  55. Joe says:

    >> if you aspire to program, but can’t be expected to understand the concept of a variable representing an address, one might want to consider a different profession.

    Except that there’s a big difference between writing a program and being in the programming profession.  There are plenty of people out there who are not programmers, but are able to write a little bit of code here and there to get things done.  I’m thinking of office workers writing VBA Excel macros, primarily graphics oriented web developers writing a small amount of glue code or JavaScript, or any other kind of hobbyist programmer.  These are people who have no interest in taking CS101, nor should they be expected to.

  56. César F. Qüeb Montejo says:

    David W Wrote>

    "if you aspire to program, but can’t be expected to understand the concept of a variable representing an address, one might want to consider a different profession."

    Totally agree with you!!!….

    New programmers are happy codifying in this "abstract" or "opaque" concept. Like some says above…"you arrives to the store… and choose any product stored in it"…

    Maybe the use of delegates clarify more this concept: references vs pointers….

    Excelent post and useful and intersting point of views… thank you partners….!!

  57. Andrey Bulat says:

    As for me I can treat "reference" as functor that encapsulates object identity (in notation of Grady Booch) and this case it is not possible to think about equality between reference and pointer (first of all because possible difference of types of pointer on object and object identifier that can be complex in general).

    Moreover if we consider STL where we shouldn’t retrieve and use pointer on objects stored in collection. While identifier is a constant for an instance, the pointer as physical address can be changed at any time by memory manager. Similar approach was used in Windows API where functions use handlers for all API objects.

    Thus reference is a pure proxy of an object. And at run time object itself can be created at another time as it is in one of use cases of GoF pattern "Proxy".

    So, pointer as an address very useful when we aware of immutability of object physical address – during execution of member function. Here we can try to obtain pointer of "this" and to directly manipulate on object aggregated by value objects. It makes possible to create more efficient code that doesn’t execute unnecessary verification and access to members using mathematically calculated pointers. I think that it is a main reason to have 2 types of complex objects: reference types and value types.

  58. Tiberiu Covaci says:

    I am a Teacher, and I usualy explain my student at first that a reference type is crreated on the heap and the address of the place where it was created will be saved on the stack. Then I make sure that they understand that they can not manipulate this address in any way, because garbage collector is owning this address, and can move it in order to compact the memory. At this stage in their career doesn’t metter for them the unsafe code part of the .NET. Later on, when they will get some more experience, they will get the difference anyway.

  59. 天空中的小雨 says:

    c,c++对于一个初学计算机的人来说有一定的难度,希望Microsoft有关于visual basic方面的教程.

  60. John says:

    For novices, "a reference is a address of a variable in process space" is a good & quick way to understanding a reference, although the address is not static in C#.

  61. I am suprised to hear that many C# book authors say this. Really, they shouldn’t be writing such a book if they don’t know that references are not something as low level as memory address pointers. I never even had to be told or read there was a difference. There is no way any high level language would allow such a low level aspect to be used as often as references are used. Even if they never had to use Intptr it still goes without saying a C# reference is of a higher level than a memory address pointer. I can understand defining such a thing is a little tricky in order to make a novice understand but that is what they signed up for when they decided to write such a book.

  62. Anthony D. Green says:

    Growing up I studied both VB and C and always maintained a mental analog between ByRef in VB and pointer parameters in C. From a level of indirection standpoint these two mechanisms are equivalent (nevermind all the more powerful things I could do with pointers that I never wanted to do). Maybe this is a flaw in my own mental model (having never really used C/C++ only studied them). And surely there is a lot of nuance that is either left out or worse implied by mixing the semantics of pointers, smart pointers, const pointers, C++ references, TypedReferences, Handles, etc, etc. And URLs and URIs are really different things.

    When I moved to VB.NET the analogy of reference types to Pointers made it pretty easy for me to understand them on some level versus value types. In this instance I think that inaccurate simplification is more useful than harmful though I can empathize with your desire to not muddle the facts. Perhaps it is better to say it backwards – that Reference is the highest level abstraction and that pointers are merely an address-based implementation of the concept and that .NET references are likewise an implementation of the pattern of referencing – that way you’re defining a .NET concretization in terms of a common abstraction, rather than by another sibling concretization.

    As an aside, at this stage of the game I wonder if the .NET team made the right design choice in coupling val/ref stack/heap semantics to classes and structures and whether C++/CLI is more spot on by decoupling the kind of object from its storage semantics.

  63. glad you got that off your chest? :)

  64. Andy says:

    I think what it all boils down to is this:

    A pointer, in every C and C++ implementation I’ve ever seen, is a memory address in the process memory space.  I suspect that pointers (unsafe as they may be) are implemented in exactly the same way in C#.

    Well, I certainly have seen such implementations. What about implementations of C that target 16 bit Windows running on x86? Do they have pointers which are “memory address in process memory space”? What does “process memory space” even mean in an operating system that doesn’t have processes? And what does it mean when pointers come in different sizes?

    Remember “near” vs “far” pointers, segmented architectures, selectors and offsets? Pointers used to be weird, man. — Eric

    A reference is a data structure (in many implementations, a simple pointer) that contains all the information necessary for the code to find the object it refers to.  in the case of C# and other CLR languages, the garbage collector may move the object, but it updates the reference to continue to describe the location of the same object, regardless of its actual location in memory.  It’s essentially like a shortcut on your desktop.  file that the shortcut refers to can be moved, but as long as the shortcut is updated with the new location, you don’t need to know where the actual file is, so long as you can click the shortcut on the desktop.

  65. Dan Moldovan says:

    Great article, with one important objection.

    Since this is a discussion on semantics, I would argue about the semantic of "address". The wod "address" would drive only some programmers to think of a memory address. Others might think of a URI, for example. Most novices will probably just think of a street address.

    Take this definition in support:

    I do agree, the statement "references are memory addresses" is wrong. But, in general, "references are addresses" is not that inaccurate.

  66. Max says:

    I think your questioning of C is incorrect.  There is no question to why C doesn’t have references.  C is a low-level language that doesn’t need the overhead of references.  You manage your own memory.  It’s difficult but not horrible.  If you don’t understand it, you shouldn’t be using it!  There are plenty of other languages that will solve your problem most likely.

  67. George says:

    If we use the house analogy, a reference would be like saying "Tom’s house". It’s not an address, it’s only known amongst the circle of friends, the "program" in which it is used. The actual address – 123 Elm Street – would be more like a pointer. So if you know Tom, you understand the reference but it doesn’t give you the exact address because it’s not needed. And if Tom moves, no problem for the reference.

    No analogy is perfect but for a beginner, this helps to keep things in context to know that a reference is not a pointer but it still, nonetheless points to an address.

    Thanks for the article.

  68. Maybe a bit late, but I still want to share my ideas and questions on this subject.

    I wonder why everybody tries to explain the differnce between a value type and a referende to by detailed discusions about what they are. I prefere to explain them by what they do.

    And then it is not as complex as it may seem.

    My explanation goes alomg he following lines:

    In modern programming languages (like C# and VB) we use complex data structures, like database records or classes (e.g. the employee in a personel sytem). These data structures are not new, the only thing new about them is that since the second world war we have been spending lots of time, blood, sweat and tears to put them in machines called computers. First as a carbon-copy of the paper implementation, i.e. everybody who needed to know about an employee still had his/her own copy of the personel data sheet for that person.

    Enter a GREAT NEW IDEA:  lets have a single copy of the electronic sheet that we all share. At first this was only implemented in the long term storage model (subject databases), but the way programming worked in those years (seventies – mid nineties) if a I would claim the personel datasheet for Mr. J.D. nobody else could access it.  Even worse: if  ihad it claimed to update his address, I could not at the same time make an update to reflect his/her gender change operation.

    Enter YET ANOTHER GREAT IDEA. We should not think in terms of private access to a personel data sheet but to shared access to that sheet with automatic and transparent propagation of changes to the data by one user (or using program function) to all other users. As happens this is not yet even close to implementation in  the real world (although CIA, FBI, MI5, James Bond etc sometimes radiate something different), but at the level of a single computer program with many functions being executed by that program at the same time we are getting the tools:

    - virtual storage, that allows us to make/write all those pieces of software you need to implement the following features. But that also means that you never can tell where you data is: on a hard drive, in the cache of the harddirve, on fast memory expansion unit (aka memory stick), …. Well its hidden for the beginners in programming, but even those beginners will understand the need for a Data Entity Manager.

    - shared memory, that offers one program the oportunity to make another crash. So in te modern programming languages its hidden from and inaccessable by the user. But is an absolute must have for a fast responding Data Entity Manager. (By the it was invented to with data and code sharing in mind. That you can crosscreate abends is the unintended down-side)

    - Data Entity Managers, that keep a single copy of the data given to them, a give shared access to every program/function with the correct credentials

    - indexers into the data managed by the data entity managers: the users.

    And the mechanism to index into the Data Entity Manager in C# en VB is called REFERENCE TYPE. And that is all there is to the Reference Type: it offers an index into the Data Entity Manager. The only thing that is a bit strange here is the name Microsoft has given their Date Entity Manager: The Garbage Collector (Lets hope Sheakespear (What’s in a nama) was right and not the old Romans (Nomen est Omen)).

    If it is that simple, then why is not all data reference type? Well, in good pogramming all meaningfull data should be implmented as a reference type (even if that means creating your own data-only classes that contain datatypes that are (wrongly?) only implemented as valuue types by the compiler maker). Value types should only be used for program internals (like loop counters), that are meaningless outside the scope of the program.

    Wim Rozendaal

  69. Mike Birt says:

    This is all well and good – and everyone sems happy with the numerous descriptions of pointers and addresses and dogs on leads etc… (good one that!) – but this confuses my address based understanding of pInvoke calls where you import an implementation using pointers and marshal it using ref… i’d always thought of references as just that, a reference, which is nice and easy and fits in with your original post of not cosidering them as anyhting other than a reference (don’t need to know? why worry about it then?). However when dealing with C++ implementations you’re trying to pInvoke to you often come across this marshalling grief – and that’s where all the stuff above leaves me confused. what exactly is going on when you use ref in your c# definition of something which in the original c++ implementation was a pointer? is the type marshaller doing clever stuff on your behalf, getting the real address (what ever that means!! not wanting to open a can of wroms there) from the GC and substituting it at runtime? Or is this functionality dependant on the implementation of the GC using addresses as the references? i’m normally quite happy not knowing this stuff, but with pInvoke situations i’ve often found myself needing to know more than i want to!

  70. Peter Wone says:

    Eric, you said

    <i>In M1 there is no code that you can write in C# that can tell you whether you are in this case or not — you have no way of knowing if x and y are refs to the same variable. In M2 you can just compare the pointers for equality and you’ll know.</i>

    You are mistaken. Observe:

    void M1(ref int x, ref int y) {
     int z = x;
     bool same = x == y;
     x = z; //side-effects are bad

    I get the idea of your code, but I think its not quite right. That would set “same” to true if x is one and y is two, for example. I think what you meant to say was something like “same = (x == y) && (++x == y)”

    That said, you still do not _know_ that x and y are the same variable. You have a good _guess_ that they are the same variable, but you do not know for sure. There could be another thread constantly watching x for changes and updating y as soon as they happen. That would give you a race condition in which sometimes “same” would incorrectly report true and sometimes would report false.

    (Of course it is a bad programming practice to pass such volatile variables by reference, and the compiler will warn you about it if you do.)

     — Eric


  71. Peter Wone says:

    @Mike Birt

    Marshalling does indeed do loads of clever stuff on your behalf. If the call signature of the function you invoke includes pointer parameters then the referenced values might potentially be changed. If you fail to specify ref then your parameter value will probably make it to the invoked function but any changes won’t make it back to the caller. This would cause problems with API functions that expect you to pass them a struct to populate.

  72. A. Yankovic says:

    Managed vs. Unmanaged Code and References vs. Pointers

    first, sorry my English :)

    secondly, I am new to this subject…

    here is my Question..

    I have a DLL import

    [DllImport(“ClientDll.GC.4x.dll”, EntryPoint = “RPC_getImage”)]
    public unsafe static extern int RPC_getImage(String imageName, byte[] content, ref UInt32 contentSize);


    I have c# Code, at the c# side I use this,

    byte[] xx;
    RPC_getImage(“my Image”, xx, 1234);


    and at the C++ side, the Code is (compacted)like this:

    RPC_getImage(const char * name,
                void *       content,
                unsigned *   contentSize)
      static void * _content;
      //next line is a intern c++ function with typical pointers that fill _content)
      _getImage(g_RPCHandle, (const char*) name, (unsigned char**) &_content, contentSize);
      memcpy(content, _content, *contentSize);
      _content = NULL;          //ok, a c++ function make _content free

    What “is” my variable xx at the c# side?

    is xx a reference to byte[] or not?

    what does the Garbage Collector do with xx?

    Thank you ! :)


    Obviously this is going to die horribly in multiple ways. A managed string is UTF16 — two bytes per char, but your code expects one byte per char. You are supposed to be passing a ref to the size, instead you are passing the size. And it is completely unclear how the runtime is supposed to marshal the byte array. To make this work you need to decorate each formal parameter in the extern declaration with the appropriate marshalling attribute so that the runtime knows what to do. — Eric

  73. Pavan Kanaparthy says:

    This is an excellent description! This is the kind of reading us techies live for. Wise choice of words! If people like this author still work for MS, then I will change my rating on MS stock to "buy" :)

  74. Slava says:

    Its good that such things gets under questions, gives better understanding to underlying logic of programming languages.

    I think in this post defenition of reference is also a very vague. Pointer adds even more confusion.

    From technical point of view there can be tons of things hapening beyond the scene, but from logical point of view its  pretty simple. Reference itself is an intension and extension of it is the object it referes to. Relation or function or all handwaving describing the way binding between intension and extension happenens is a function(relation or whatever you like it to call) translating(mapping) intension of reference for the curent state into its extension (in current state).

    Thanks for article.

  75. TheCPUWizard says:

    An excellent post (and followup), but one inconsistancy (quotations reversed in order to make post easier…)

    "Pointers are strictly "more powerful" than references; anything you can do with references you can do with pointers, but not vice versa. I imagine that’s why there are no references in C — it’s a deliberately austere and powerful language. "

    "The inventor of the C programming language, oddly enough, chose to not have the concept of references at all. Rather, Ritchie chose to have "pointers" be first-class entities in the language"

    You explain very well WHY there are no references with the first quote (which occurs later in your original post). Provided there was "one way to skin a cat" many other "features" were eliminated.

    Remember the C was originally designed to run on a DEC PDP-11 mini-computer. The maximum addressable space (directly at one time by aq single process) was [actually still is – I have a few of them as well as a PDP-8 and Vaxen] 32KW….sooo many aspects of C derive from this processors architecture….

  76. Visitor says:

    Very very good post, I have seen so far for pointer and references