Everybody thinks about CLR objects the wrong way (well not everybody)


Many people responded to Everybody thinks about garbage collection the wrong way by proposing variations on auto-disposal based on scope:

What these people fail to recognize is that they are dealing with object references, not objects. (I'm restricting the discussion to reference types, naturally.) In C++, you can put an object in a local variable. In the CLR, you can only put an object reference in a local variable.

For those who think in terms of C++, imagine if it were impossible to declare instances of C++ classes as local variables on the stack. Instead, you had to declare a local variable that was a pointer to your C++ class, and put the object in the pointer.

C# C++
void Function(OtherClass o)
{
 // No longer possible to declare objects
 // with automatic storage duration
 Color c(0,0,0);
 Brush b(c);
 o.SetBackground(b);
}
void Function(OtherClass o)
{
 Color c = new Color(0,0,0);
 Brush b = new Brush(c);
 o.SetBackground(b);
}
void Function(OtherClass* o)
{
 Color* c = new Color(0,0,0);
 Brush* b = new Brush(c);
 o->SetBackground(b);
}

This world where you can only use pointers to refer to objects is the world of the CLR.

In the CLR, objects never go out of scope because objects don't have scope.¹ Object references have scope. Objects are alive from the point of construction to the point that the last reference goes out of scope or is otherwise destroyed.

If objects were auto-disposed when references went out of scope, you'd have all sorts of problems. I will use C++ notation instead of CLR notation to emphasize that we are working with references, not objects. (I can't use actual C++ references since you cannot change the referent of a C++ reference, something that is permitted by the CLR.)

C#C++
void Function(OtherClass o)
{
 Color c = new Color(0,0,0);
 Brush b = new Brush(c);
 Brush b2 = b;
 o.SetBackground(b2);





}
void Function(OtherClass* o)
{
 Color* c = new Color(0,0,0);
 Brush* b = new Brush(c);
 Brush* b2 = b;
 o->SetBackground(b2);
 // automatic disposal when variables go out of scope
 dispose b2;
 dispose b;
 dispose c;
 dispose o;
}

Oops, we just double-disposed the Brush object and probably prematurely disposed the OtherClass object. Fortunately, disposal is idempotent, so the double-disposal is harmless (assuming you actually meant disposal and not destruction). The introduction of b2 was artificial in this example, but you can imagine b2 being, say, the leftover value in a variable at the end of a loop, in which case we just accidentally disposed the last object in an array.

Let's say there's some attribute you can put on a local variable or parameter to say that you don't want it auto-disposed on scope exit.

C# C++
void Function([NoAutoDispose] OtherClass o)
{
 Color c = new Color(0,0,0);
 Brush b = new Brush(c);
 [NoAutoDispose] Brush b2 = b;
 o.SetBackground(b2);


}
void Function([NoAutoDispose] OtherClass* o)
{
 Color* c = new Color(0,0,0);
 Brush* b = new Brush(c);
 [NoAutoDispose] Brush* b2 = b;
 o->SetBackground(b2);
 // automatic disposal when variables go out of scope
 dispose b;
 dispose c;
}

Okay, that looks good. We disposed the Brush object exactly once and didn't prematurely dispose the OtherClass object that we received as a parameter. (Maybe we could make [NoAutoDispose] the default for parameters to save people a lot of typing.) We're good, right?

Let's do some trivial code cleanup, like inlining the Color parameter.

C# C++
void Function([NoAutoDispose] OtherClass o)
{
 Brush b = new Brush(new Color(0,0,0));
 [NoAutoDispose] Brush b2 = b;
 o.SetBackground(b2);


}
void Function([NoAutoDispose] OtherClass* o)
{
 Brush* b = new Brush(new Color(0,0,0));
 [NoAutoDispose] Brush* b2 = b;
 o->SetBackground(b2);
 // automatic disposal when variables go out of scope
 dispose b;
}

Whoa, we just introduced a semantic change by what seemed like a harmless transformation: The Color object is no longer auto-disposed. This is even more insidious than the scope of a variable affecting its treatment by anonymous closures, for introduction of temporary variables to break up a complex expression (or removal of one-time temporary variables) are common transformations that people expect to be harmless, especially since many language transformations are expressed in terms of temporary variables. Now you have to remember to tag all of your temporary variables with [NoAutoDospose].

Wait, we're not done yet. What does SetBackground do?

C# C++
void OtherClass.SetBackground([NoAutoDispose] Brush b)
{
 this.background = b;
}
void OtherClass::SetBackground([NoAutoDispose] Brush* b)
{
 this->background = b;
}

Oops, there is still a reference to that Brush in the o.background member. We disposed an object while there were still outstanding references to it. Now when the OtherClass object tries to use the reference, it will find itself operating on a disposed object.

Working backward, this means that we should have put a [NoAutoDispose] attribute on the b variable. At this point, it's six of one, a half dozen of the other. Either you put using around all the things that you want auto-disposed or you put [NoAutoDispose] on all the things that you don't.²

The C++ solution to this problem is to use something like shared_ptr and reference-counted objects, with the assistance of weak_ptr to avoid reference cycles, and being very selective about which objects are allocated with automatic storage duration. Sure, you could try to bring this model of programming to the CLR, but now you're just trying to pick all the cheese off your cheeseburger and intentionally going against the automatic memory management design principles of the CLR.

I was sort of assuming that since you're here for CLR Week, you're one of those people who actively chose to use the CLR and want to use it in the manner in which it was intended, rather than somebody who wants it to work like C++. If you want C++, you know where to find it.

Footnote

¹ Or at least don't have scope in the sense we're discussing here.

² As for an attribute for specific classes to have auto-dispose behavior, that works only if all references to auto-dispose objects are in the context of a create/dispose pattern. References to auto-dispose objects outside of the create/dispose pattern would need to be tagged with the [NoAutoDispose] attribute.

[AutoDispose] class Stream { ... };

Stream MyClass.GetSaveStream()
{
 [NoAutoDispose] Stream stm;
 if (saveToFile) {
  stm = ...;
 } else {
  stm = ...;
 }
 return stm;
}

void MyClass Save()
{
 // NB! do not combine into one line
 Stream stm = GetSaveStream();
 SaveToStream(stm);
}
Comments (34)
  1. Anonymous says:

    Ok, I'm curious: Since the blog software doesn't create anchors for comments, where do you get the numbers to put in your links?

    Apart from that, it almost looks like you posted yesterdays article first so you'd have some comments to refer to. Honi soit qui mal y pense.

    [The comment numbers are available to me on the comment moderation control panel (which you don't get to see since you're not a moderator). -Raymond]
  2. Dan Bugglin says:

    To be fair, those people who said objects should be disposed when they go out of scope would have amended their suggestions to say when all object references have gone out of scope, had they been thinking of the differences between the two.

  3. Anonymous says:

    I don't write managed code for living, but every CLR week is a pure masterpiece! Thanks Raymond!

  4. Anonymous says:

    Whenever I have to deal with the pain of IDisposable in C#, I wish they had done it like C++CLI, where you can have something like automatic scope on variables, and the compiler takes care of calling the Dispose methods of all IDisposable fields in your Dispose method/destructor automatically.

    This article has a few mistakes in it, but it talks about what I'm talking about:

    http://www.codeproject.com/…/cppclidtors.aspx

    In my opinion, C++CLI got resource management in a GC language correct.  However, which language my company uses isn't my choice.  C# is great, but once you start having heavy usage of resources, it can be very painful, very quickly if you aren't careful.

  5. Anonymous says:

    I meant to add, there is an explicit different in C++CLI between a reference (^) and the object itself.  (Even if it's actually a handwave behind the scenes.)

  6. Anonymous says:

    "Either you put using around all the things that you want auto-disposed"

    What if what you want to auto-dispose is held as a member variable, not a local variable? Then using can't be used – so the class has to implement IDisposable to call Dispose on its members – and so does every class that holds an instance of the initial class, repeat until every class that could transitively hold an IDisposable also implements IDisposable. That's more boiler plate code than replacing raw pointers with auto_ptr or shared_ptr, and it's more fragile. Then, of course, you still can't share instances that need to be deterministically disposed.

    And you often can't do it without breaking the Liskov Substitution Principle or altering interfaces to be IDisposable.

    Automatic memory management C# style (and it is C#, not the CLR – C++CLI does a far better job) is an obstacle that dramatically increases the difficulty of managing resources that aren't memory. Memory isn't the most important resource by a mile – it's fungible, so a small to moderate leak goes unnoticed, whereas if you leak e.g. a single file, your users will probably notice that they can no longer edit that file without restarting the application.

    GC is fine for bashing together a small project, but once there's any resource except memory in play you're in trouble.

  7. Anonymous says:

    @Joe

    I did some analysis of the costs of having an IDisposable field in some classes in our product, and adding a single IDisposable field to one low level class caused 650+ classes that reference it as a field or reference a class that references it to require IDisposable.  There were ~50 classes that referenced it directly, and it just kept expanding out from there.

    All those classes would have to correctly call Dispose on it (or the referencing class) in their Dispose methods, and to have every instance of those classes in a using(…), etc… The amount of boilerplate code is just staggering, and the potential for simple mistakes/omissions is huge.  Granted FxCop helps, but it's not perfect.

  8. gibwar says:

    @Ashley "where you can have something like automatic scope on variables, and the compiler takes care of calling the Dispose methods of all IDisposable fields in your Dispose method/destructor automatically.":

    You still run into the same problem as before though. Take the following code:

    internal class SomeClass : IDisposable {

     // pretend the int needs to be disposed…

     internal int SomeValue { get; private set; }

     SomeClass() { this.val = -1; }

     SomeClass CreateSomeClass(int newVal) {

       SomeClass c = new SomeClass();

       c.SomeValue = newVal;

       return c;

     }

     protected override void Dispose() {

      // pretend it does something

     }

    }

    If there was automatic scope on variables the object you just created and returned would have been disposed of! If the GC started disposing of objects when they left the scope you'd never be able to return an object and you'd have to start using unsafe pointers as parameters to get objects out.

  9. Anonymous says:

    Pierre B.: One problem with that is now creating an object that you want to be disposed at the end of scope looks identical, at the point of use, to creating a variable to one that doesn't lock any resource and can be disposed of at any time.  So an unaware programmer could edit the code so that the ref-count isn't decremented appropriately (for example they might pass a file object back to the caller so that they could retrieve information from it, like path or error information).  

    This isn't as much of a problem in C++.  It can happen with shared_ptrs, but these are actually used fairly rarely in my experience, so can be checked individually.  Usually you'd use an object directly on the stack, and if you wanted to pass it back you'd use a copy (since this is simpler to understand, even though it's less efficient; always optimise later).  But if the object locked a resource its copy constructor would be disabled, and the programmer would find their mistake at compile time.

  10. Anonymous says:

    @gibwar: C#'s lack of deterministic destruction is so deeply ingrained that it would be technically challenging to fix (not to mention practically impossible). Certainly calling Dispose on every IDisposable that goes out of scope isn't sufficient.

    At a minimum, you'd need to remove the concept of reference types and value type and simply have types, letting the user say whwther to store/pass by value or reference. Value variables then get deterministic destruction. Add in copy constructors and assignment operators and you've got RAII.

    Of course, RAII is of limited use when RAII objects can be held by GC owned objects – you'd lose deterministic destruction. So only slight improvements over current C# when they're held by non-RAII objects, but the situtation wouldn't get any worse and would get much better for variables held on the stack.

  11. Anonymous says:

    What bothers me is that there isn't (by default, out of the box) any compiler warnings for bogus use of IDisposable objects. I realize that in the general case it's almost impossible to get "right", but it'd be easy enough to arrange that the programmer can give the compiler a hint about usage of an object.

    For example, there should be a warning or even an error about storing an IDisposable in a field of an object that isn't itself IDisposable. Or new()ing an IDisposable without either using()ing it or returning it. Or receiving one from another method without using()ing it or returning it. There should be a simple way to indicate that you know what you're doing and don't want that warning, but the default should be to warn you.

    I'm sure I can get something like this if I can figure out which options to turn on and off under code analysis or FxCop. My quibble is just that it should be the default in the language. I think FxCop is a complement to the language, not a substitute for it, and in this case I feel like the support should be in the language itself.

    (oh, and in .NET I believe that Color is a value type so it IS in fact stack-allocated…)

  12. Anonymous says:

    Having learned Java as my first language, the C# way of doing things seems normal and objects on the stack always seem kind of weird to me. Does that make this a generational difference? :)

  13. Leo Davidson says:

    Many of those pitfalls exist with the existing "using/dispose" system, as well as in C++, so I'm not sure they're valid parts of the argument against what is essentially a pure syntax change.

    That said, I do agree that a change isn't really needed, at least for local variables.

    (For members, it seems a nice idea to have a shorthand that automatically implements a Dispose method that disposes all IDisposable members, just to avoid boilerplate typing. However, I don't think there's a magic solution to the issues people raise with types changing from non-disposable to disposable and the required changes recursing out into the rest of the code. If you change the semantics of how a class is used then you have to change everything that uses it, and possibly everything that uses that (if it changes in turn) and there's not much you can do about that in any language.)

    I do wonder what people are doing in C# that requires so many disposable objects that this becomes a big deal. Maybe C# isn't the best language for that sort of code. For code that mainly only has to clear up memory, the current system works great.

  14. Anonymous says:

    @gibwar

    I think I didn't explain what I meant well enough.  See Logan Capaldo's response.  You have the choice between automatic scope or not, and the difference is enforced by the compiler.

    That situation doesn't happen because of compiler enforcing the difference.  It has references which are signified like String^ and automatic scope variables with are written like String.  In your case, you would use something like String^, not String.  Of course, you can still get in trouble if you get a reference to a automatically scoped variable.

  15. Anonymous says:

    @fodder

    Now remember that every feature starts with negative 100 points. Where would you say the 101 points come from to justify all the work?

  16. Anonymous says:

    @Marquess: indeed – and I'm not sure it would be worthwhile implementing the idea. I still think it's a shame one can't do RAII in C#, though :) – and nobody have mentioned classes not implementing IDisposable, and thus not supporting the 'using' keyword…

  17. Anonymous says:

    @f0dder: "nobody have mentioned classes not implementing IDisposable, and thus not supporting the 'using' keyword…"

    To be fair, that's not much different from a class in C++ not having a destructor that releases its resources, and thus not supporting RAII.

    No amount of language design is going to remove the ability to make poor API decisions.

  18. Anonymous says:

    "Now remember that every feature starts with negative 100 points. Where would you say the 101 points come from to justify all the work?"

    How about this: C#'s solution to the resource management problem is that it doesn't work – unless the only resource you care to manage is memory.

    Or we could go with the reason that C# has generics, delegates, extension methods, static classes, partial classes, default parameters etc etc – not having them makes C# less useful to power users than C++.

  19. Anonymous says:

    One solution to this is to instead mark classes as having strict ref-counting semantic instead of relying on the garbage collector. This way you can design classes that act as aggressive resource manager, freeing resources as soon as they're unused. This doesn't solve the problem of cycle of objects, but if you create cycles on objects that you expect to free immediately when they go out-of-use, then your program is arguably buggy. That's what weak references and reference policies are for. Unfortunately, most language have poor automated support for detecting reference loops.

    CPython is like that, except that *everything* uses strict ref-counting semantic. And the semantic is, somewhat unfortunately, a side-effect of the CPython implementation and not mandated by the language. They chose not to mandate it so that Python can be implemented on pure GC CLR.

    [The downside of aggressive resource management is reentrancy. -Raymond]
  20. Anonymous says:

    Raymond, you're confusing the CLR with a specific language here I think. In C++/CLI you absolutely can write code that has an IDisposable dispose itself when its reference goes out of scope. Thanks to the type system, you can't get yourself into the situation you describe here, because the auto-magic disposing variables have a different type than the non automagic ones. Eg `SomeDisposable foo; … code …` gets transformed by the compiler into code that is the equivalent of `SomeDisposable^ foo = gcnew SomeDisposable(); try { … code … } finally { delete foo; /* this invokes Dispose in C++/CLI */ } but a SomeDisposable is not interchangeble with a SomeDisposable^ so you don't run into the same problems as your attribute approach (of course it is also less flexible than the attribute approach). But like you say, if I want C++[/CLI] I know where to find it.

  21. Anonymous says:

    "What these people fail to recognize is that they are dealing with object references, not objects."

    Aren't you just falling into your own firefighter trap? You're confusing the mechanism and the goal.

    It should be possible. The fact that the way CLR reference types work prevents it just means that the CLR would have to be modified in some way or another, but that doesn't mean it's wrong to say "it should be possible".

    The (very real) issue these people pointed out is that poor old-fashioned C++ which doesn't even have a GC, actually has a robust method for avoiding resource leaks of all kinds, whether the resource in question is memory, file handles or anything else. Fancy-schmancy .NET with its GC can prevent memory leaks easily, but it has *no* robust way to handle leaks of other resource types. In many ways that's one step forward and two backwards.

    [It's a philosophical difference. The CLR follows the model of traditional GC languages like Scheme and Lisp where objects have infinite lifetime. You cannot tie actions to object destruction because they are never destroyed (in theory). Instead, you need to tie actions to something else. You could try tying them to scope, which leads either to "using" or refcounting and hidden reentrancy. C# chose to be explicit with "using". -Raymond]
  22. Anonymous says:

    What about something along the lines of keeping the current way of GC as a default, but adding (compiler + VM) support for a class-level ScopedDestructionAttribute? It would require a fair amount of reworking, namely refcounting and compiler-level support; the refcounting would be relatively expensive, since it would have to be updated whenever references to an object are taken or released. Apart from performance, I wonder what pitfalls this would bring?

    As for performance, I'm aware that refcounting can be a bottleneck. OTOH, in my (limited :)) experience resources that need IDisposable tend to be *relatively* long-lived – I don't really mind threadsafe refcounting of sockets and file handles, whereas it hurts for strings and such.

  23. Anonymous says:

    [The comment numbers are available to me on the comment moderation control panel (which you don't get to see since you're not a moderator). -Raymond]

    For those broken comment links, I think if you want, you can edit those comments you want to link to and manually put a name-anchor tag there.

  24. Anonymous says:

    Everyone things about database connection resources the wrong way.

    There is value having a system that simulates a machine with an infinite number of available database connections.

  25. Anonymous says:

    "Fortunately, disposal is idempotent"

    Hahaha! Not in the codebase I'm working in :(

  26. Anonymous says:

    I know I'm side stepping here but why is there "Microspeak" for commonly known terms in regards to the whole *Sharp architecture? I mean, why/who decided that instead of having a Virtual Machine it should be known as Common Language Runtime, or that bytecode sounds way too boring, lets dub it Intermediate Language and so on. At least garbage collection wasn't renamed!

    The reason(s) for this can't possibly be legal, since these are merely terms just like keyboard or Operating System.

    [The Common Language Runtime is the name of the .NET Framework virtual machine. (It's like saying "Why does GM call it a Cadillac instead of a car?") The term "virtual machine" itself is probably avoided because it can easily be confused with hardware virtualization. And to many people the word "bytecode" implies the user of an interpreter. -Raymond]
  27. Anonymous says:

    @E

    I find 'Common Language Runtime' more descriptive than Virtual Machine.. So much is called Virtual Machine these days.. And this name suggests to me that more than one language will use this runtime (framework).

    Also, Intermediate Language describes that it is somewhere between what I wrote and what's going to be executed by the cpu, so again, very descriptive.

  28. Anonymous says:

    @E

    Come on, this is harmless. Microsoft calls the partition where Windows boots from the “System Partition,“ while the one where the system is installed is the “Boot Partition.“ I believe we can consider ourselves fortunate that the CLR vocabulary is still somewhat similar to the widespread usage.

  29. Anonymous says:

    These proposals for partial ref-counting and auto-dispose etc aren't new. Brian Harry posted a long message on the Developmentor Advanced .NET list during the initial beta. BradA copied it to his blog at blogs.msdn.com/…/371015.aspx – not sure where else it is.

    Thanks Raymond!

  30. Anonymous says:

    > You could try tying them to scope, which leads either to "using" or refcounting and hidden reentrancy. C# chose to be explicit with "using". -Raymond

    Raymond, could you explain what you're referring to with "hidden reentrancy"?

    I'm guessing you're talking about hidden introduction of a non-reentrancy bug, but I don't see how "using" protects against that.

    In fact, I've seen reentrancy bugs in "using" code where I thought that adding an explicit refcount would help.

    [I mean that with "using" there is an explicit declaration that "Some extra code (namely Dispose) is going to execute at the end of this block, so watch out!" Whereas with automatic reference counting, there is no such "warning keyword". -Raymond]
  31. Anonymous says:

    @E

    Intermediate Language *is* a common industry term.  It's the term I'm more familiar with from my school days (where we used Linux and plain ol' C; not really a heavy-Microsoft environment).

  32. Anonymous says:

    @Stuart: Color is not necessarily on the stack. The stack is an implementation detail: blogs.msdn.com/…/the-stack-is-an-implementation-detail.aspx

  33. Anonymous says:

    Amen to Ashley's first comment. The way I see it, C# assist you as a consumer of IDisposable classes. But it doesn't assist you in implementing them through composition of other IDisposable classes. That's the extra ingredient that C++/CLI has: a special way of marking a member/field so that it is understood to be "owned" by the containing class, so C++/CLI can auto-generate the containing class's Dispose method to call the owned object's Dispose method. The problem with trying to explain this to people is that the C++/CLI syntax for such an owned member is identical to the C# syntax for a non-owned reference field.

    To add this to C# would require some other stuff to be ironed out. For example, should an owned-reference field be implicitly readonly (as it is in C++/CLI)? Or should you be able to assign a new object to it? (In which case, should that automatically dispose the previous object?)

  34. Anonymous says:

    @Jason I know the stack is an implementation detail, but it doesn't change my point in the slightest.

    The *semantics* of a value type in .NET are such that, if Color is a value type,

    Color c = new Color(0,0,0);

    Brush b = new Brush(c);

    passes a *copy* of c to the Brush constructor, not a reference. It's true that both c and the color argument to Brush's constructor might be in registers or in some other data structure that has nothing to do with a stack. But regardless of the implementation detail as to how they are stored, it is *definitely* the case that if Brush's constructor were to modify its argument (including somehow marking it disposed), that would have no effect on the value of c.

    (Of course, the value-typeness of Color was still just a throwaway, nitpicker's-corner-esque aside in my original post. So I suppose it's fair game to nitpicker's-corner me back…)

Comments are closed.