References to value types…


Ian wrote a comment about being about to return references to value types.

Basically, he’s asking for a way to build a collection of value types so that they can easily be modified, and to do this, he’d like the indexer to be able to return a reference rather than a value.

This restriction shows up from time to time, usually when somebody wants to write code like (example from Ian):

List<MyStruct> vec = new List<MyStruct>();
vec.Add(new MyStruct(10, 20));
vec[0].X += 1; // Error!

Because

vec[0]

gets a copy of the value stored at vec[0] and puts it in a temporary local, being able to modify the X property on that temporary copy is a useless thing to do, so the language prohibits it.

So why won’t C# let you get the reference to a value? Well, it’s all because of the presence of the garbage collector. If you had a reference (aka internal pointer) somewhere into the middle of an array, any movement of the array as part of a GC would mean that the reference would be invalid.

It might be possible to add a construct to a system that would say “this is an internal pointer, and if you move the outer object, you need to update the inner pointer as well, but that would mean that the reference to the value would not be a simple pointer, but a reference to another object which encapsulated the inner pointer (so the GC could find it), which pretty much defeats the purpose of being able to do this.

In C++, the programmer owns the memory, and gets to choose how things get created, deleted, and moved around. In C#, the GC owns the memory, and the programmer just gets to borrow it for a short period of time. Working in the C# world requires a different mindset, and until you internalize it, things are likely to seem a bit weird.

The meta message is around using structs in C#. In a word, don’t use structs unless you’re forced to use them – and by forced I mean that you need to do interop, or you’ve looked at your profiling data and realized that you really need to reduce the number of objects that you have around. In other words, they shouldn’t be the default choice – there are a lot of disadvantages and gotchas with structs, though a few of those will go away when Whidbey shows up.

And if you do use structs, see if you can make them immutable. That hides most of the ugly cases.

Hope that makes sense

Comments (23)

  1. I always make my structs immutable. Once you start thinking of structs as immutable objects (which is very consistent with the very notion of a value type), it makes everything easier.

    Is it useful to mention the solution of the above problem?

    List<MyStruct> vec = new List<MyStruct>();

    vec.Add(new MyStruct(10, 20));

    vec[0] = new MyStruct(vec[0].X + 1, vec[0].Y);

  2. Making structs immutable can make them even more ugly. If you only have two pieces of data, it is not two bad. But once you have to copy several properties into the constructor, to change one, it is not so good.

    Also, making structs immutable is pointless if you are using structs for performance reasons. Usually we need to use an array of structs, and need the ability to change one property in each struct in the array. If you have to create a new one and copy it in, instead of changing in memory, you have wasted cycles that will not be picked up by the primitive optimizer.

  3. haacked says:

    So how does the .NET Garbage collector get around this problem with a regular array?

    Is it simply because the array is also allocated on the stack rather than the heap?

    If so, could one create a struct that’s a custom collection and not a reference object?

  4. Heh – I was just about to post the same question as Philip Haack…

    But I’d like to add to what he says. Regular arrays are *not* created on the stack. They’re objects on the managed heap. (Otherwise you wouldn’t be able to return arrays from functions, or store references to them in fields.)

    So given that the IL opcode you use to retrieve an element from a value type array:

    ldelema

    returns an pointer to the element you asked for, surely that would be an interior pointer?

    So apparently the CLR already supports the use of managed interior pointers into arrays because that’s how arrays work.

    What’s special about arrays that lets this work?

    Moreover, isn’t this what interior pointers are all about:

    http://www.voidnish.com/articles/ShowArticle.aspx?code=interiorpointers

    My understanding is that interior pointers are what C++/CLI uses to solve exactly this problem. So why don’t we have interior pointer support in C#?

  5. joc says:

    structs would be very useful if the "Dispose" method was called automatically at block exit. In this way we could easily implement deterministic finalization, without the "using" keyword.

  6. Scheisse says:

    When we allocate an array of structs there is no managed pointers to those structs.

    if we had managed pointers to a struct in an array it would invalidate the main purpose of that scheisse – ability to allocate a big chunk of memory at once

    otherwise GC would move tham as it wants

    so just use classes if you need lvalue

  7. What about adding the ability for C# to access the unnamed box type associated with particulary value type? That way you can do something like this

    List<box MyStruct>

    which would allow you to treat a value type as a reference if you want.

  8. DrPizza says:

    Since interior pointers already exist (or will for 2.0, if they don’t already) I don’t understand the problem. It’s a long-standing crapness of C# that it can’t do this.

  9. dsf says:

    http://www.desiccants.cn desiccants

    http://www.saw-blade.net/”>http://www.saw-blade.net/ diamond saw blade

    http://www.shuyang.net

    http://www.6i5.net travel

    http://www.topvip.cn 虚拟主机

    http://www.topvip.cn/domains/ 域名注册

    http://desiccants.topvip.cn desiccants

    http://domains.topvip.cn domains

    http://hot.topvip.cn America hot keywords

    http://insurance.topvip.cn insurance

    http://china.topvip.cn China hot products

    http://forum.topvip.cn search engine optimization

    http://sawblade.topvip.cn saw blade

    http://lingerie.topvip.cn lingerie

    http://desiccant.topvip.cn 干燥剂

    http://travel.topvip.cn 旅游

    http://hosting.topvip.cn 虚拟主机

    http://meishi.topvip.cn 美食

    http://training.topvip.cn 培训

    http://pet.topvip.cn 宠物

    http://ticket.topvip.cn 机票

    http://computer.topvip.cn 计算机

    http://best.topvip.cn 关键词网站推广

    http://crawfish.topvip.cn 龙虾

    http://posuiji.topvip.cn 破碎机

    http://sexy.topvip.cn 内衣

    http://flower.topvip.cn 鲜花

    http://free.topvip.cn 虚拟主机

    http://jokes.topvip.cn 笑话

    http://www.aotubangshi.net 干燥剂

    http://www.shenlite.com 发酵罐

    http://www.chengbiao.com 旅游

    http://www.chengbiao.net“>http://www.chengbiao.net 鲜花

    http://www.china-crawfish.com 龙虾

    http://www.taikoclay.net 活性白土

    http://www.cnhymc.net 凹凸棒活性白土

    http://www.8899777.com 龙虾美食

    http://www.drycn.net 干燥设备

    http://www.ganxi.net 干洗机

    http://www.chengbiao.net“>http://www.chengbiao.net 鲜花

    http://www.jiansuji.net 减速机

    http://www.adulthappy.net 成人保健品

    http://luosi.tangzhengfa.com 精密螺丝

    http://ganzaoji.bleaching-earth.com 干燥机

    http://www.saw-blade.net saw blade

    http://www.chinascrew.net 螺丝

    http://www.0020.net 网站推广

    http://www.0030.net 宠物

  10. To ‘Scheisse’:

    You’ve missed the point. And it looks like perhaps you haven’t read the spec either. If you read section 11.1.1 ("Native Size: native int, native unsigned int, O and &") of Partion I of the ECMA CLI spec, and specifically the sub-section Managed Pointer Types: O and &" You’ll find that it says:

    "The & datatype (managed pointer) is similar to the O type, but points to the interior of an object. That is, a managed pointer is allowed to point to a field within an object or an element within an array"

    So when you say:

    "When we allocate an array of structs there is no managed pointers to those structs."

    while this may be true at the point of allocation, it ignores the rather more important fact that it is perfectly possible to get a managed pointer to any struct in the array. Sure, those managed pointers don’t all exist as soon as you create the array, but I was never saying that they do or that they should. I’m not really sure why you introduced the subject of what happens at the instant at which the array is created, because it’s not relevant to this discussion. The important point is that you *can* get a managed pointer to a struct in an array when you need one.

    In fact you positively *have* to get managed pointers to the structs in the array in order to do anything with them!

    The only way to retrieve an element from an array of structs is to use the ‘ldelema’ IL instruction. This returns a managed pointer to the struct you wanted inside the array. So if you are accessing an element in an array of values, you *are* dealing with an lvalue – no need to resort to reference types.

    Nonetheless, your claim that this invalidates the point of using structs is incorrect. The managed pointers are only in existence either on the evaluation stack or as locals, and typically have an extremely short lifetime. The array spends most of its life with no active pointers into it. So the overheads involved with using an array of reference type do not exist with a value type, despite the fact that using an array of value type necessarily involves working with managed pointers into that array.

    Note that if the GC moves the array while there happens to be an active managed pointer into it, any managed pointers that were pointing into it get updated as part of the GC process. (I think this is a an emergent property of normative parts of the spec, but it is explicitly called out in the informative section 11.1.15, "CIL Instructions and Pointer Types".)

    Section 13.4.2 of Partition II of the ECMA CLI spec ("Managed Pointers") also contains some more useful information on this general area. (It’s another ‘informative only’ section, but does pull together a lot of useful information from various parts of the spec.)

    To the ever-inflamatory DrPizza: managed pointers have been in the CLR since v1, as they are fundamental to how arrays of structs work. And they’ve also had a degree of explicit support in C# since v1 as it happens. What do you think the ‘ref’ keyword does?

    The problem is not that the C# language doesn’t support unmanaged pointers – actually it does. The problem is that it places some arbitrary restraints on the use of managed pointers – essentially you can only pass them into functions, and not back out. (Or use them implicitly through arrays.) Managed pointers do in fact have some slightly funny restrictions – the conceptual model for how you can and can’t use them is a bit weird. So C# uses a conservative model – it reduces the power in exchange for a simple programming model.

    Given that part of the philsophy of C# is that simplicity is more important than completeness, I can see them not wanting to support the full strangeness of interior pointers as a language feature. It’s inconvenient, but I have to admit that the alternative is ugly.

    C++ on the other hand is happy to expose the underlying awkwardness of constructs in the name of completeness.

  11. Thomas Eyde says:

    To me, structs look more and more like a bad idea. Wouldn’t C# be a better language without them? Then we would not have these funny restrictions.

    I can’t see where structs supports the ‘simplicity’ model. Some other things are not simple anymore. Or is there a difference between ‘obvious’ and ‘simple’?

  12. Thomas Eyde:

    You need the ‘struct’ type to do P/Invoke.

  13. Matthew W. Jackson says:

    Not to mention that without structs DateTime and Decimal would be a LOT more heavyweight than they need to be.

    If all structs that weren’t internal and used only for P/Invoke were correctly made immutable, then you wouldn’t notice any difference when using them.

    That’s why Decimal feels like int…because it’s immutable. Once you make it mutable you have to worry about references to structs.

    (Oh, and structs such as List<T>.Enumerator are fine too…even they aren’t really immutable–but few people even touch the enumerator directly).

  14. No structs? Please don’t even think about it. Regardsless of how fast the GC becomes, stack and register based data is faster, simply because it cannot be aliased (modified in multiple places simultaneously). This means optimizers, perhaps even more in the future, can make temporary structs really fly.

    Without structs you can forget about fast math or graphics libraries.

  15. Paul Tessier says:

    Seeing as it’s possible to return a managed pointer in IL and MC++, I see no point in not extending the use of the ref keyword to support this in C# also.

    public static ref int First(int[] array)

    { return ref array[0]; }

    static void Main(string[] args)

    {

    int[] array = new int[] { 0, 1, 2 };

    ref int val = First(array);

    val = 10;

    Console.WriteLine(array[0] == 10);

    }

  16. Paul, the problem then becomes the fact that there are restrictions as to what you can do with such things.

    For example, you can’t store an interior pointer in a field of a class or struct. So this is ruled out:

    public class Foo

    {

    public ref int ri;

    }

    Now consider the implications of this for their use as local variables. As of C# 2.0, a local variable is not always implemented as a local at the IL level.

    In C# v1.x it is – there is a direct correspondence between the idea of a C# variable and an IL local variable. The reason this changed in c# 2.0 is the support for anonymous delegates. They have access to everything that was in scope at the point at which they were defined even if they outlive that scope.

    This is done by moving any variables declared outside of but used from inside of the anonymous method into a class, rather than storing the variables as locals at the IL level. That’s the only way to make the variable accessible to both scopes. (Remember that the anonymous method’s scoope may be syntacticaly nested in its containing scope, but its allowed to outlive that containing scope.)

    So what would happen if we were to add an anonymous method to your Main function above?

    EventHandler dlg = delegate

    {

    val = 20;

    };

    This can’t possibly work. In order to make val accessible to this anonymous method, it has to be hoisted into the generated class that holds variables shared across the scopes. But since your ‘val’ variable is an interior pointer, it’s not allowed to live in a class. So this can’t be done.

    This means you have some pretty subtle restrictions on when you can use a ‘ref variable’ of this type. The only practical way to understand where you can and can’t use it is to understand everything that’s going on under the covers.

    So I think that’s why C++ supports this kind of thing – the philosophy of that language is to make all the grungy innards available for the developer to tinker with. But I’ve not seen any evidence that interior pointers could be integrated into C# without compromising the philosophy of simplicity that underpins the language.

  17. joelpt@eml.cc says:

    Can somebody help this newbie understand the difference between a mutable and immutable struct, by way of a concrete example in C#?

    Thanks.

  18. Paul Tessier says:

    Not a problem as this is already caught by the compiler. Try this little one out:

    public EventHandler F(ref int someInt)

    {

    EventHandler dlg = delegate

    {

    someInt = 20;

    };

    return dlg;

    }

    The compiler already handles managed pointers at the local scope and in fact needs to. In the above function I’m declaring that I have a managed pointer in the parameters. This is a local variable that is explicitly a managed pointer.

    I’m not allowed to do the same thing myself unless it’s a parameter, which is an arbitrary constraint. Notice that the only difference between what we have now and what I propose is that I can declare a ref in scope to receive a managed pointer.

    I do not believe that references to local variables should be allowed, as the following is just a waste. We could have just used numberOne anywhere we plan on using numberTwo.

    int numberOne = 300;

    ref int numberTwo = ref numberOne;

    numberTwo 2 = 400; // now numberOne == 400

    I believe that the ref keyword, should be used only for receiving "ref returns", or receiving references to heap members. Example:

    ref MyBigDataBlock data = largeBlockCache.GetBlock(12);

    // etc..

    MyBigStruct[] myArray = new MyBigStruct[30];

    for(int i = 0; i < myArray.Length; i++)

    {

    // need ref in front of myArray[i] just as if we we’re passing it to function

    // that had a ref parameter.

    ref MyBigStruct temp = ref myArray[i];

    temp.SomeInitializer();

    temp.SomeFloat = 2316738.3225;

    // etc..

    }

    Just like ref parameters though a ref declared locally can only be assigned at creation, which means it’s impossible to assigning it to a new managed pointer. Example:

    string[] names = new string[10];

    ref string firstName = ref names[0];

    firstName = "Bob" // names[0] is now "Bob"

    firstName = ref names[1] // error wrong type, cannot convert string reference to string

    Not that I advocate having everything in C++/CLI be in C# but, I would at least like a way to access the values returned as managed pointers, especially since it can be verified CLR. Anytime I can switch somethings from C++/CLI to C# and still not need the UnmanagedCode security, I’m a happy man.

    Just to be clear, I’m dead set on this not being a CLS compliant

    feature.

  19. Paul Tessier says:

    Imutable struct:

    struct SquareInt

    {

    private int _sqValue;

    public SquareInt(int value) { _sqValue = value * value; }

    public int Value { get { return _sqValue; } }

    }

    Notice that SquareInt can only be assigned a value at creation. Even though _sqValue isn’t readonly.

    That’s imutable, means if you what another one with a different value, you best create a new one.

    Strings, on the other hand, are not structs but, are still imutable because you cannot change it’s value. You can say:

    string word = "dog";

    word[2] = ‘t’;

    Doesn’t work, cause theres noway to change it once it’s been created.

  20. Paul Tessier says:

    As it turns out, using managed pointers for parameters, locals, and returns is all verifiable and part of CLS. So it seems that returning a managed pointer is already part of CLS?!? There is a special case for properties:

    <blockquote><strong>CLS Rule 27:</strong> The type of a property shall be the return type of the getter and the type of the last argument of the setter. The types of the parameters of the property shall be the types of the parameters to the getter and the types of all but the final parameter of the setter. All of these types shall be CLS-compliant, and shall not be managed pointers (i.e. shall not be passed by reference).</blockquote>

    Basically having a setter for a property that is a managed pointer is a no-no in CLS.

    <pre>ref T this{int index]

    {

    get { return ref _array[index]; } // CLS compliant

    set { _array[index] = value; } // Not CLS compliant

    // (equivalent to: void set_Item(int index, ref T value)

    // not really need either if we have the pointer

    }</pre>

    From a C# persective builtin arrays have a indexer property that returns a managed pointer.

    Now for the weird part. While returning a managed pointer is verifiable, calling such a fuction isn’t.

    <blockquote><strong>ECMA spec 1.8.1.2.1 Verification Types [just before 1.8.1.2.2]:</strong> A method can be defined as returning a managed pointer, but calls upon such methods are not verifiable.

    <strong>Rationale:</strong> some uses of returning a managed pointer are perfectly verifiable (eg, returning a reference to a field in an object); but some not (eg, returning a pointer to a local variable of the called method). Tracking this in the general case is a burden, and therefore not included in this standard</blockquote>

    Now if anybody can understand why that statement makes sense, please explain it to me. Because obviously, if the function is verified then, all the type constraints and other sanity checks all passed; including checking wether the type of the managed pointer matched. If a function is verifed then how should calling it should result in unverified behavior. That’s like getting a warenty that’s only good as long as you don’t use the product, kinda useless. It means that the function isn’t "completely" verified, a half-assed attempt was made.

    Ok, so basically if a function returns a managed pointer, then it is marked as verified when it really isn’t. The only thing that prevents truely verifing the function is to check the instruction used to get the pointer. There are only three instructions that create pointers to memory not on the heap: <strong>ldarga</strong>, <strong>ldloca</strong>, and <strong>ldsflda</strong>. And only <strong>ldloca</strong> will get can a pointer that’s only valid for the lifetime of the fuction.

    How is checking out that a managed pointer is pointing to a anything except a local stack variable a "burden". One need only track that <strong>ldloca</strong> wasn’t used to get the managed pointer being returned. This could be the inclusion of another verification type in the stack simulation or a second pass evaluation when the return is a managed pointer.

Skip to main content