Everybody thinks about garbage collection the wrong way


Welcome to CLR Week 2010. This year, CLR Week is going to be more philosophical than usual.

When you ask somebody what garbage collection is, the answer you get is probably going to be something along the lines of "Garbage collection is when the operating environment automatically reclaims memory that is no longer being used by the program. It does this by tracing memory starting from roots to identify which objects are accessible."

This description confuses the mechanism with the goal. It's like saying the job of a firefighter is "driving a red truck and spraying water." That's a description of what a firefighter does, but it misses the point of the job (namely, putting out fires and, more generally, fire safety).

Garbage collection is simulating a computer with an infinite amount of memory. The rest is mechanism. And naturally, the mechanism is "reclaiming memory that the program wouldn't notice went missing." It's one giant application of the as-if rule.¹

Now, with this view of the true definition of garbage collection, one result immediately follows:

If the amount of RAM available to the runtime is greater than the amount of memory required by a program, then a memory manager which employs the null garbage collector (which never collects anything) is a valid memory manager.

This is true because the memory manager can just allocate more RAM whenever the program needs it, and by assumption, this allocation will always succeed. A computer with more RAM than the memory requirements of a program has effectively infinite RAM, and therefore no simulation is needed.

Sure, the statement may be obvious, but it's also useful, because the null garbage collector is both very easy to analyze yet very different from garbage collectors you're more accustomed to seeing. You can therefore use it to produce results like this:

A correctly-written program cannot assume that finalizers will ever run at any point prior to program termination.

The proof of this is simple: Run the program on a machine with more RAM than the amount of memory required by program. Under these circumstances, the null garbage collector is a valid garbage collector, and the null garbage collector never runs finalizers since it never collects anything.

Garbage collection simulates infinite memory, but there are things you can do even if you have infinite memory that have visible effects on other programs (and possibly even on your program). If you open a file in exclusive mode, then the file will not be accessible to other programs (or even to other parts of your own program) until you close it. A connection that you open to a SQL server consumes resources in the server until you close it. Have too many of these connections outstanding, and you may run into a connection limit which blocks further connections. If you don't explicitly close these resources, then when your program is run on a machine with "infinite" memory, those resources will accumulate and never be released.

What this means for you: Your programs cannot rely on finalizers keeping things tidy. Finalizers are a safety net, not a primary means for resource reclamation. When you are finished with a resource, you need to release it by calling Close or Disconnect or whatever cleanup method is available on the object. (The IDisposable interface codifies this convention.)

Furthermore, it turns out that not only can a correctly-written program not assume that finalizers will run during the execution of a program, it cannot even assume that finalizers will run when the program terminates: Although the .NET Framework will try to run them all, a bad finalizer will cause the .NET Framework to give up and abandon running finalizers. This can happen through no fault of your own: There might be a handle to a network resource that the finalizer is trying to release, but network connectivity problems result in the operation taking longer than two seconds, at which point the .NET Framework will just terminate the process. Therefore, the above result can be strengthened in the specific case of the .NET Framework:

A correctly-written program cannot assume that finalizers will ever run.

Armed with this knowledge, you can solve this customer's problem. (Confusing terminology is preserved from the original.)

I have a class that uses Xml­Document. After the class is out of scope, I want to delete the file, but I get the exception System.IO.Exception: The process cannot access the file 'C:\path\to\file.xml' because it is being used by another process. Once the progam exits, then the lock goes away. Is there any way to avoid locking the file?

This follow-up might or might not help:

A colleague suggested setting the Xml­Document variables to null when we're done with them, but shouldn't leaving the class scope have the same behavior?

Bonus chatter: Finalizers are weird, since they operate "behind the GC." There are also lots of classes which operate "at the GC level", such as Weak­Reference GC­Handle and of course System.GC itself. Using these classes properly requires understanding how they interact with the GC. We'll see more on this later.

Related reading

Unrelated reading: Precedence vs. Associativity Vs. Order.

Footnote

¹ Note that by definition, the simulation extends only to garbage-collected resources. If your program allocates external resources those external resources continue to remain subject to whatever rules apply to them.

Comments (89)
  1. Mike S says:

    I seem to recall reading about issues where under heavy load, an app would allocate objects faster than the finalizer could clean them up resulting in OK memory performance most of the time, with spikes (and occasional crashes) under heavy load. Of course, the solution was to have the app clean up the objects, so the GC could do its job normally.

  2. Marquess says:

    “A correctly-written program cannot assume that finalizers will ever run at any point prior to program termination.”

    Not quite. When instantiating a class that implements IDisposable in a C# “using” statement, one can reasonably assume that its Dispose method will be called at the end of the using statement.

  3. Henning Makholm says:

    @Marquess: IDisposable.Dispose is not a finalizer, and a finalizer is not IDisposable.Dispose.

    C# uses the same syntax for finalizers as C++ does for destructors, but the similarity does not go any further than syntax. IDisposable is much more like a destructor in C++. If I understand correctly, Visual C++ will translate the destructor syntax into Dispose calls when generating managed code.

  4. Dan Bugglin says:

    We now know once the XmlDocument (or perhaps the underlying FileStream the customer used to read the document with) goes out of scope, the finalizer may be called at any time the garbage collector sees fit (or may never be called).  Thus the customer must explicitly close the file when they are done with it in order to unlock the file.

    "A colleague suggested setting the Xml Document variables to null when we're done with them, but shouldn't leaving the class scope have the same behavior?"

    Yes.  Yes it will.  "Nothing" is a valid behavior…

    @Marquess .Dispose is not a finalizer, so his point is still valid.  The created object may still hang around for any amount of time before its destructor/finalizer is called.

    The destructor/finalizer of any IDisposable should only call .Dispose so they would end up doing the same stuff, but it's not strongly enforced any an implementor can still do whatever they want.

  5. Random832 says:

    Marquess, a Dispose method is not a finalizer.

    There is some confusion due to the use of syntax similar to C++ destructors, whereas an actual C++ destructor acts a lot closer to a dispose method than a finalizer

    blogs.msdn.com/…/what-s-the-difference-between-a-destructor-and-a-finalizer.aspx

  6. C# noob says:

    I have a naive question – if the finalizers are not guaranteed to run, why have them? What useful thing can you do in a finalizer?

  7. asdbsd says:

    So there's no point at all in finalizers? If whatever you plan on doing there is important, you can't rely on a finalizer running and thus have to go with an explicit cleanup. Cleaning up again in those "safety nets" only makes it worse, because this hides the fact that somebody forgot to perform the explicit clean up properly. The only legit use for a finalizer is to put 'if (!resourcesReleased()) throw "Call Cleanup() before releasing the object, you dimwit"'.

  8. mquander says:

    Finalizers are useful as a stopgap measure to call "Dispose"-like functionality if it isn't otherwise called.

    For example, it's best practice to Dispose an XmlDocument, and release the lock on the file; but the finalizer for XmlDocument should also call Dispose (if it's not disposed already), so that even if the caller doesn't call it, the lock will probably get released when the GC comes by, rather than waiting for the program to terminate.

  9. mquander says:

    @asdbsd: (I guess whether it's best to clean up quietly or complain somehow is up to your philosophy of API design, but in .NET the former is typically done.)

  10. Stewart says:

    @C# Noob, because sometimes they run. They can be used as a safety net to clean up after yourself if you forget to call dispose, which can make the infinite memory illusion last for longer. This means that anything you do in a finalizer must be something the OS is going to do anyway when the process goes away (closing sockets or files, freeing native memory etc.). If the OS won't do it, you need a stronger guarantee than a finalizer.

    In modern C# it is very very very rare to need to write a finalizer, and you should carefully scrutinize any code that has one which passes through your code review list.

  11. SimonRev says:

    As a long time C++ programmer, I tend to agree with asdbsd — Finalizers tend to obscure program logic bugs rather than help you.

    Of course, I much prefer rigorous use of RIAA to garbage collection as you have full control when you release your resource, yet if you use RIAA right, you never have to remember to clean up at the end.  And that applies to any kind of resource, not just a memory resource.

  12. C# noob says:

    @mquander: But then the lock will be released when the app runs out of memory. That's just silly. The two things are unrelated and shouldn't be tied together.

    If you are Disposing in a finalizer you are covering up bugs, and they WILL make it into the final product. Unlike most bugs that show when the system is low on resources, these bugs will surface when the system is high on resources, like on a machine 2 years from now.

  13. Leo Davidson says:

    Finalizers are not pointless unless you can prove your code (including libraries, etc.) is 100% bug-free. As the root post says, they shouldn't be relied upon but they can still lessen the impact of bugs.

    If I'm using a program and a bug causes it miss a Close/Dispose call, leaving a file open, then I'd rather that file was closed a while later than have it stay open for the life of the program.

    If I'm developing or testing a program then, sure, I want to know when that goes wrong, but if I'm using it I'll have the safety net, please.

    The syntax used for them is unfortunate but it's too late to change that and, even switching between C++ at home and C# at work, it never caused me any real problems. Just one of those language/syntax differences you have to be aware of (like reference semantics between C++ and C# or the big == operator differences between C++, C# and Java.).

  14. Mark says:

    mquander: and this ties into simulating an infinite amount of memory, because releasing the lock on the file will release a little bit more memory, which is the reason the GC's aim.  It has the unfortunate side-effect of closing a handle that should already have been closed.

  15. DWalker says:

    Fascinating.  These kinds of posts and comments are what I like most about this blog.  Thanks, Raymond.

    [Days like today are why I leave comments open. Good discussion, everybody. -Raymond]
  16. Roger Lipscombe says:

    @SimonRev

    I think you mean RAII, rather than RIAA?

  17. Paul Gunn says:

    RIAA means 'Resource Acquisition Is Initialization'

    en.wikipedia.org/…/Resource_Acquisition_Is_Initialization

  18. benski says:

    @Marquess

    He meant RAII – Resource Acquisition is Initialization

  19. @Marquess says:

    RIAA = Resource Acquisition is Initialization

    Personally I feel that this is a dumb name because if you ask me, the hallmark of it (and what most people are talking about) is the use of destructors to ensure a deterministic release of resources. It does have additional connotations — in particular that you acquire the resource at the variable declaration (which is where the name comes from) — but I think that they are far subservient to the destructor-release portion.

    Google for "C++ RIAA" and you'll have a little more success, or see en.wikipedia.org/…/Resource_Acquisition_Is_Initialization

  20. Joel says:

    @Marquess he means RAII (Resource Acquisition Is Initialization)

  21. gibwar says:

    Raymond, what an awesome way to describe garbage collection. I've always understood it this way but this will help explain it to other programmers. Thanks for that!

    Everyone else with the "what's the point" comments: There is a subtle difference between finalizers and IDisposable.Dispose(). Finalizers are meant to release unmanaged resources only when called. In fact, if you're cleaning up managed objects in your finalizer you could potentially throw an exception because your child objects may have already been disposed or finalized. You can think of them as a way to guarantee that unmanaged resources are cleaned up and that you don't leak those resources. When .Dispose() is called, you clean up BOTH unmanaged and managed resources and mark yourself as finalized since you just did the work of both. This allows you to make sure that all child objects are disposed of properly and that everything is closed.

    Here's an example of a proper finalizer/dispose pattern: (not thread safe)

    public class Test {

    bool disposed = false;

    ~Test() {

     Dispose(false);

    }

    public void DoSomething() {

     if (disposed) {

      throw new ObjectDisposedException();

     }

     // do something…

    }

    protected override void Dispose() {

     Dispose(true);

    }

    private void Dispose(bool disposeManagedResources) {

     if (disposed) return; // calling disposed more than once should not throw an exception.

     if (disposeManagedResources) {

       // dispose of all objects in this class, including child managed objects.

     }

     // clean up unmanaged resources

     // such as Win32 handles and the like.

    }

    }

    I hope the above formats correctly. Some good reading on the disposable pattern: kleahy-technical.blogspot.com/…/idisposable-and-garbage-collection.html

  22. gibwar says:

    @Evan: that's a good point, I should be more careful in choosing my wording. Obviously, as described, it isn't a guarantee but if your program is still running you aren't really leaking resources. If you need to access a file you left open but let drift away you need to adjust your program to release that resource. Same thing when acquiring HBRUSHes and the like. You're suppose to dispose of them to clean up resources but the finalizer, if called, will clean them up for you, helping keep the resource use down. Pretty much the rule of thumb is, if it has a .Dispose() method, call it when you're done. (Not in all cases, especially ASP.Net controls)

  23. Ben Hutchings says:

    There are several related things garbage collection can achieve, and each of these might be the goal that motivates its use:

    1. Elimination of use-after-free bugs.

    2. Mitigation of memory leaks. (Memory leaks are still possible since reachability does not imply that an object will ever be used again.)

    3. Reduction of memory management code, and consequent increase in programmer productivity.

  24. Ooh says:

    Starting with .NET 2.0 and the availability of SafeHandles, a VERY good rule of thumb is: "If you see a finalizer, it's most likely a bug."

    Reasons: 1. Do NOT release any managed resources in a finalizer (as gibwar wrote). In fact you can't even access any other managed objects in a finalizer since those could have been collected already. So, the only things you can safely access are value-typed fields. 2. The only thing you can store in a value-typed field which needs to be released is some sort of unmanaged handle. For exactly this use case the BCL introduced SafeHandles, which receive special treatment of the CLR.

    So, treat every finalizer as bug. Except if you really know what you're doing.

  25. Stephen Cleary says:

    Excellent article, Raymond.

    Addressing some confusion in the comments:

    Finalizers and IDisposable are completely different. Finalizers may only free unmanaged resources and do not support RAII; IDisposable enables RAII, which means one can do other things besides just "free resources." Examples range from flushing file output (FileStream) to cancelling an asynchronous event subscription (IObservable).

    I've written up 3 simple rules for implementing IDisposable and Finalizers for freeing resources: nitoprograms.blogspot.com/…/how-to-implement-idisposable-and.html  This is a lot easier to understand than Microsoft's official IDisposable code pattern.

    Regarding the "set variables to null" idea, I address this rather completely here: nitoprograms.blogspot.com/…/q-should-i-set-variables-to-null-to.html

  26. jader3rd says:

    The .Net 4.0 Garbage Collector only cleans up memory and assemblies. So if there are resources which are not memory (handles) they should be cleaned up properly (ie, Disposed). If you Dispose of your object, it's memory isn't cleaned up. That's something for the Garbage Collector to do. I disagree with the statement that the Garbage Collector is to give the illusion of infinite memory, but it still is an interesting thought to ponder none the less.

  27. blah says:

    What a longwinded way of illustrating the need for using() {}.

    [You're focusing on the example (finalizers) and not the principle (GC philosophy). It's fascinating seeing how people take away from an article something didn't even consider to be part of the topic! -Raymond]
  28. Dominik says:

    Does this blog software work? Sometimes? Now?

  29. Edward Z. Yang says:

    There is an interesting parallel between stack-based allocation schemes and garbage-collected schemes; in particular, if I put something on the stack I rightfully can assume that it gets "deallocated" as soon as I return. Of course, C stacks tend not to have finalizers either, so the analogy breaks there.

  30. Dominik says:

    Sorry Raymond.

    What's the solution to the customers problem? I can't find a Dispose or Close method on XmlDocument. Is XmlDocument locking the file?

  31. Obvious joke says:

    you can solve this customer's problem

    Tell the customer to start removing DIMMs from the system until the GC is running all the time.

  32. Rob K says:

    This is why I do not care for garbage collection at all. It solves (and I'm being generous) only one specific type of a general problem – dynamic resource allocation. RAII is a much better way to handle the issue. Garbage collection, since it doesn't actually destroy objects when they go out of scope, gets in the way of doing RAII and becomes a huge pain in the butt. IDisposable is a hack work-around on that.

  33. dalek says:

    In C# destructors are translated to a Finalizer so if I understand correctly there is no guarantee the destructor of a C# object is called?

    If so I propose a modification to the rule of Ooh: "If you see a finalizer or a destructor, it's most likely a bug."

  34. SimonRev says:

    Apologies to all, I meant RAII not RIAA (was just talking about the recording industry earlier, which no doubt contributed to the slip).

    Several folks made the point that RAII can be achieved with using(){}.  And if I am using C# (which I do on occasion) I am grateful for it.  I have found (dislcaimer:  this is my experience, your mileage will vary) that because 90% of the time you don't need to worry about releasing a resource in C#, it is easy to forget to do a "using" the remaining 10%.  In C++ you always need to worry about it, which makes it real easy to remember that when obtaining a resource make sure you have taken care of its release as well. (In essence, make sure it is stored in something whose destructor will free it).  I have found this pattern a lot harder to follow in C# than in C++.

    Having said all that, there is a lot to like about .NET and C# as well.  I just personally find that the whole "garbage collector saves you" aspect that is pitched in every intro to the language I have encountered more of a trap than a salvation.

    Actually Rob K expressed my feelings very succinctly

  35. zxc says:

    Getting back to the example, as a poster already pointed out, XmlDocument does not implement IDispose.  So what's the answer?  Just don't use XmlDocument.Save(filename)?  Seems like you would need to use a mechanism where you COULD force the file write and file close, like using an XmlWriter.

  36. gibwar says:

    @Dominik, @zxc: Judging from the error provided they are probably using the XmlDocument.Load(stream) overload. Using that method they would have had to open up a stream to the file itself (probably a FileStream, possibly a StreamReader even) and passed that in. If they didn't close the underlying stream the file would still have been locked.

    The documentation on XmlDocument.Load(string) doesn't specify if it leaves the file open so it would probably be safe to assume that the file is closed after loading the file (especially since it accepts a URL as an option).

  37. Zan Lynx says:

    I agree f0dder. C# should have kept the C++ style destructor.  Or it should at least have promised to call finalizers on scope exit even if it never promised to call them on GC of heap objects.

    The "using" statement is far too easy to forget.

    [I think you wouldn't like it if object references were finalized on scope exit.

    MyThing thing = new MyThing();
    return thing; // oops, returning reference to object that has been finalized
    

    -Raymond]

  38. Marquess says:

    @fodder

    You must be one of those programmers who think C is better than Ada just because the source code takes up less disk space. There's nothing wrong with having to write more if you gain expressiveness. The way C# handles Dispose gives you the option of explicitly disposing or leaving it to the GC. Forcing a disposal/finalization at the end of a block is a good way of increasing an algorithm's runtime by an order of magnitude; on the other hand, it may be necessary to force a big object out of memory or release a lock. C# gives you that choice.

  39. pulp says:

    Maybe the programer should give the GC a hint with a attribute like that:

    [Dispose]

    class ClassWhichOpensAFile

    {

    }

    now the GC could call the destructor of the class immediately after leaving scope if this attribute is set.

    [You're replaced one bug with another.

    ClassWhichOpensAFile OpenAndInitialize(string filename)
    {
     ClassWhichOpensAFile o = new ClassWhichOpensAFile(filename);
     o.Initialize(x, y, z);
     return o; // bug here
    }
    

    -Raymond]

  40. zxc says:

    To follow up on the example in the OP, you put a FileStream in a "using" block and call XmlDocument.Save(stream).

    @gibwar – XmlDocument.Save(string) is the problem child that does not release the file

    @f0dder – You would enjoy the wonders of Objective C memory management with it's mishmash of deallocation strategies and secret lore (as I'm sure all iPhone developers currently are…)

  41. Random832 says:

    "@gibwar – XmlDocument.Save(string) is the problem child that does not release the file"

    Says who? No indication in the OP that they are calling either that or Load(string).

  42. zxc says:

    @Random832 – Umm, because I tried it.  Empirical beats speculative.

  43. zxc says:

    @Random832 – Whoops, I blew it!  XmlDocument.Save(string) is fine, I was seeing a different issue.  My apologies.

  44. Jeff says:

    The analogy doesn't work for me.  It's like saying a city bus simulates having an infinite number of seats because people sometimes get back off the bus.  It doesn't provide any additional seats during rush hour.  All the GC does is automatically identify which seats are empty so you don't have to keep track in your program.  It's a much faster way to program, but still has its own problems – and the framework does not provide any coherent way to manually manage memory when the automation fails.

    What is really missing is that IDisposable does not require (or imply) a Using block.  Any local variable that is IDisposable should dispose itself when it goes out of scope – whether a Using statement is provided or not.

    If .NET provided that functionality, any class that implements IDisposable could function as if it really had a destructor – knowing that the users could not forget to clean it up when done.

    There is already a compiler warning if your class owns an IDisposable member but is itself not IDisposable, but the compiler does not extend this to locally scoped variables.

    But alas, more and more of the .NET framework neither implements IDisposable nor supports classes that do.  The entire WPF framework intentionally avoids IDisposable – and MS claims it is not needed – yet the WPF framework also leaks memory as an almost direct result of this omission.

    [All you folks who are suggesting that local variables should be auto-disposed are forgetting that you're working with references, not objects. It's like saying that in C++ all pointers should be auto-delete'd when they go out of scope. -Raymond]
  45. Henning Makholm says:

    Garbage collection is ONE PARTICULAR implementation strategy for simulating infinite memory. It seems to be the one that works best in practice for a very large range of realistic programs, and is therefore the dominant strategy in actual use. But there are various alternatives considered in programming-language research, generally involving some amount of static analysis of the program to avoid or minimize the trace-through-everything overhead of garbage collections. These alternatives are spoken of as *alternatives* to garbage collection for achieving the infinite-memory illusion, not as *forms* of garbage collection

    [Point well-taken. -Raymond]
  46. Jeff says:

    @Raymond: Obfuscating references and letting the developer treat them as instances is the true purpose of the GC.  It allows developers to write more code faster (cheaper).

    But that obfuscation itself becomes a problem whenever situations arise where the difference is actually important, and the GC provides no support at all to help in those situations.

    IDisposable is a tiny fix that only helps when your disposable class is used and disposed in a single function.  There's no "last one out, turn off the lights" capability when a single disposable object is owned by many others.  And trying to implement your own on top of the GC is much more difficult than it was in unmanaged code.

    And C++ "smart" pointers did just that – they auto-delete'd when they went out of scope, yet could still be copied into members or returned from functions just fine.

    The .NET GC does tree-walking only. C++ smart pointers used reference counting only.  I think adding reference counting to the GC could improve the behavior of classes that implement IDisposable.

  47. malloc says:

    What does that even mean to simulate an infinite amount of memory???

    I would argue that a correctly written program using malloc and free is simulating an infinite amount of memory…

    And the "more RAM than the program needs" situation is particularly useless, as the behavior will be exactly the same for a leaking malloc only program that terminates before exhausting the memory resource…

    So I would argue that what define garbage collection is precisely the mecanism (of your article) not the goal (of your article). And that the true goal is indeed to free the programmer from the burden of programming it's own memory management.

    Now memory management really is the point, and you are right on reminding beginners that a kind of automatic memory management is NOT automatic generalized resource management. But I could not make sense of the way you introduced that. The thing that probably made lot of beginners mistake memory management for automatic generalized resource management in the first place is the confusion coming from C++ between resource finalization and freeing memory — which is fine as soon as RAII (traditional C++ one) is in place but should have been exterminated in derivative languages more clearly that it has been, and replaced by clearly identifiable constructs both fitting the same safety target and memory management environment.

  48. Marquess says:

    @SimonRev

    What is RIAA? Googling is pointless, what with the *other* RIAA hogging the first two million results.

  49. Leo Davidson says:

    @C# noob: Think of it from the user's point of view.

    Most software companies take years to fix non-critical bugs (if they fix them at all and especially if they're hard to reproduce). Wouldn't you want the framework to reduce the impact of bugs when it can detect them and can deal with them in a way which provably has no ill effect on the user/environment, where ignoring the bug (or crashing mid-operation over something that was entirely recoverable) is never better and often worse than fixing it?

    We're not talking about blindly ignoring access violation exceptions or reaching into other apps and closing their file handles here; we're talking about things like the framework knowing that no part of the process could ever possibly access a file handle again nor (according to the language's rules) expect that file to remain open, and thus cleanly closing the file handle.

    Yes, *debug* builds should scream when finalizers have to do anything. Maybe release builds should inform the user and ask them to send a bug report, too.

    That doesn't make the concept of finalizers wrong unless we're going to pretend that all bugs get found and fixed during development or promptly after being reported, which a fantasy in most cases.

  50. POKE53280,0 says:

    When we combine C++ smart pointers like shared_ptr with STL containers like std::vector, we can achieve a kind of "deterministic garbage collector" (without the overhead of a .NET-like GC), and this works fine for both memory resources and non-memory resources (e.g. files, sockets, textures…) assuming that class destructors do their job right.

    A mature language like C++ and proper smart pointer classes and container libraries easily outperform a .NET-like (or Java-like) GC, IMHO.

    In modern C++, with proper RAII use, I think it is quite hard to have memory (or other non-memory resource) leaks.

    (And I find C++ destructors and stack semantics much easier and more elegant than C# IDispose/using(){}…).

  51. gibwar says:

    *sigh* I knew I missed something. At the last line of if (disposedManagedResources) should be "GC.SupressFinalize(this)".

  52. Evan says:

    @gibwar: "You can think of them as a way to guarantee that unmanaged resources are cleaned up and that you don't leak those resources."

    A big part of this thread is that you *can't* use finalizers to guarantee *anything*.

  53. Joshua says:

    @Ooh: There's one other thing you can do. You can call a P/Invoke function to clean up a native library. My *one* finalizer does exactly that.

  54. ShuggyCoUk says:

    @RobK

    "RAII is a much better way to handle the issue. Garbage collection, since it doesn't actually destroy objects when they go out of scope, gets in the way of doing RAII and becomes a huge pain in the butt. IDisposable is a hack work-around on that. "

    Try implementing full (arbitrary lifetime, not stack bound) closures without something doing GC, or it's moral equivalent…

    IDisposable is fine for RAII except for you being obliged to do it yourself (i.e put in the using statements or chaining the dispose of one class to dispose it's others). More compile time sanity checking would be nice (the FxCop rule for checking it is not ideal being over zealous in obvious cases (like the disposable object being returned from the function) but it is fundamentally quite hard.

    Incidentally if your programming language exposes the concept of deterministic lifetime for objects then it is no longer simulating infinite memory…

    On another note Finalizers can be very useful for people using many app domains in long running processes. If an app domain goes down due to a crash there is a reasonable chance that things protected by SafeHandles and Finalizers might get cleaned up. Not guaranteed but it makes it much more likely. That said the number of people who should actually write a Finalizer is *very* low and should be intimately aware of the very special rules regarding them.

  55. steveg says:

    I'm a big fan of GC, and I'm from a C++ background. It's a joy to spend time solving a problem without the time spent fighting implementation details (don't get me started on the STL, it's hardly written from the joy of code maintenance perspective). However, as others have observed you need to pay attention to what you're doing; under load you can run into issues (especially if you're not being rigorous with IDisposable objects).

    One suggestion I have for the C# Visual Studio team is to visually indicate a variable is an IDisposable object, which might be easier than the potentially non-deterministic analysis required for a warning. (yes, yes, yes, I should tell them).

    Of course some objects in the framework implement IDisposable that don't need to, so you'd need a [DontWarnAboutIDisposable] or something to work around those.

  56. f0dder says:

    @SimonRev: RAII is also sometimes called RIIA, which doesn't really sound different from RIAA when pronouncing – but oh boy is the concept different :)

    I'm a big fan of RAII in C++, and I wish C# would have a "exiting scope destructor", instea of having to do explicit "using" statements – I do realize that this would have a lot of potential problems, but it's a bit sad you have to do *more* work than in C++. As it is, the only real use finalizers have is to scream bloody murder if Dispose hasn't been called.

    Garbage collection seems nice enough for reclaiming *memory*, though.

  57. f0dder says:

    @Marquess: my problem with the C# way is that for "some objects" you can just fire-and-forget, whereas with "other objects" you need the manual cleanup. This, imho, makes it easier to miss cleanup than when you either have to do *all* cleanup manually (C) or can be guaranteed your destructors run when going out of scope (C++).

    I'm not arguing against GC, it can be advantageous to defer a bunch of memory cleanups until a later time – but imho a scope-destructor *as an option* would have been nice.

  58. Marquess says:

    “but imho a scope-destructor *as an option* would have been nice.”

    That's what “using” is for!

  59. Gabe says:

    There are plenty of reasons why it might not be a bug if your finalizer runs. Weak pointers are perhaps the best example.

    Furthermore, there are plenty of cases where a "using" block is probably an indication of a bug. The Reactive Framework's IObservable is the first example of this that pops to mind.

    Quite frankly, I think it's easier to determine whether I need a "using" or not, than to try to figure out whether I need an auto_ptr, shared_ptr, or scoped_ptr.

  60. Evan says:

    @Marquess: "There's nothing wrong with having to write more if you gain expressiveness."

    Writing *more* is fine… having to write stuff that is easy to forget and *almost* works if you get it wrong isn't.

    @Marquess: "Forcing a disposal/finalization at the end of a block is a good way of increasing an algorithm's runtime by an order of magnitude; on the other hand, it may be necessary to force a big object out of memory or release a lock. C# gives you that choice."

    If only you could get the same choices in, say, C++ with a little API design. Oh well.

    @Marquess: "That's what “using” is for!"

    It's just not a great substitute. Sure, it's better way of doing resource management than what C gives you. But with proper destructors, you write the destructors and they Just Work — there's no need to use them every single time you create an object to be destructed. With using, you are requiring the programmer to know that "using" is necessary and to call it each time you use your object.

    C#'s "using" is little more than some syntactic sugar for an extra try..finally block, whereas I personally feel that RAII is much more. I strongly feel the distinction is important enough that some explicit language support would be a good idea. Make structs automatically destructed or something like that. (Note: I don't know enough about C# to know whether this would actually work, but it's the sort of line I'm thinking along.)

  61. Drak says:

    Thanks for an interesting article Raymond!

    By the way, to all the people saying that it's hard to know when to use 'using': it comes with experience. I use lots of the same objects every day, and have 'muscle memory' as to whether or not they need 'using'. If I use a new type of object, I just try 'using' and see if I get an error about it. If not, I keep 'using', if there's an error, I stop 'using'.

  62. Mike says:

    (Apologies if this is a dupe; the Post button appears to be on the fritz)

    Your point about the unreliability of finalizers is a good one, but I'm a little uneasy about the philosophical approach: X simulates Y, and Y implies Z, so if X then you have to assume Z. The trouble is that this argument is only as strong as the simulation is accurate. Tetris simulates gravity, and gravity causes acceleration, therefore blocks will be moving faster at the bottom of the screen than at the top. It's like arguing by analogy; the road doesn't take you very far and you don't always realize when you've gone off the end of it…

    GC is a *terrible* simulation of infinite memory. Given infinite memory "while (true) myList.Add(new HugeThing());" wouldn't be any more problematic than "while (true) myvar = new HugeThing();", but it is, and the difference isn't some obscure implementation detail. So yes, thinking about GC as magical resource cleanup pixie dust will get you into trouble, but thinking about GC as infinite memory will also get you into trouble. Neither is any substitute for a basic understanding of what's actually going on.

    I'm not even sure that "simulating infinite memory" is the fundamental value proposition of GC. A better statement to my mind would be "simulating somewhat less than my actual memory (GC needs headroom to work well) but with vastly simpler management". Or, going even further afield, I recall a fascinating piece by Herb Sutter in which he sang the praises of GC as fundamental to type-safety. That is, the value isn't so much in the fact that GC cleans up your memory so much as the fact that NOTHING ELSE DOES; that's what allows the type system to guarantee that a non-null Foo pointer is actually pointing at a Foo as opposed to just dangling. (And interestingly, IDisposable subverts this, because a Dispose()d Foo isn't really a valid Foo.)

  63. Joe says:

    @Stephen Cleary:

    IDisposable does not enable RAII. RAII is the automatic and deterministic destruction of resources – in C# an instance of a class which implements IDisposable will not have Dispose called automatically and deterministically unless that instance is managed by a using statement. That is much weaker than RAII, because using statements can only appear at method scope, not at class scope.

  64. Mike Dimmick says:

    Another interesting problem with finalizers – although an implementation detail – is the fact that they run on a separate thread from any thread that you might have allocated the unmanaged resource on. If your resource has thread affinity, this can either cause corruption, or an attempt to call back to the thread that created it. If you're not expecting this to happen, the finalizer thread's callback never gets serviced and then all your finalizers stop working (as they're all serviced from that one finalizer thread).

    I seem to recall tracing a problem in WiX's linker back to finalization of a class holding an MSI handle.

  65. Roger Hågensen says:

    I guess the lesson here is to ALWAYS clean up after yourself.

    For example I do a lot of Windows programming using PureBasic and I also use the Win32 API a lot. http://purebasic.com/

    Now, PureBasic keeps track of the memory and resources it allocates and frees them automatically when the program ends normally. (if you do Win32 API you need to use the related API's there for cleaning those things up obviously)

    Now despite that handy feature, I still manually at the end of the program free memory I have allocated and so on. So for me that auto cleanup is a safety net just in case I missed a cleanup somewhere. So in pretty much all my programs my own code frees all that I allocated, then the language itself runs a cleanup, but since all the stuff was cleaned up naturally it doesn't have to.

    There is a small overhead there that could be avoided I guess but I'd rather have a program spend some extra cycles trying to leave the a system in the same state as when it started.

  66. rs says:

    Garbage collection vs. "deterministic" memory management: This can also make a difference in user experience. For determistic memory management, there tends to be a more direct correspondence between user's actions and the program's response.

  67. Leo Davidson says:

    Moving between C#/C++, a feature of GC I really miss is being able to throw a managed object into a queue without worrying about who is responsible for freeing it. When the thing processing the queue might be cancelled and shutdown at any moment (including before, during or immediately after you add to the queue) it can get quote complicated in C++.

    Yes, you can (and I have) do it in C++ by having ref-counting on the queue and everything inside it, but doing that is a pain compared to just sticking a reference in a queue and having it done for you.

    No language in the world allows you to do things you couldn't do in virtually any other language; the differences between them are how easy they make things and which details they force you to concentrate on.

    (I prefer some aspects of C++ but the lack of GC isn't one of them.)

    @POKE53280,0:

    Are you sure the performance is always better? In some cases, sure, but not all. Most code spends time waiting for the kernel/disk/network/whatever to do things and it can be better to clean up memory in those times than to delay execution to do it earlier.

    GC effectively gives you memory clean-up in parallel rather than in series.

    GC is not *always* better but cleaning up memory ASAP isn't always better either.

    @Mike:

    Good points, IMO.

  68. Matthew says:

    @zxc: The answer is to Close / Dispose whatever was loaded (msdn.microsoft.com/…/system.xml.xmldocument.load.aspx) into the XmlDocument.

  69. Evan says:

    @POKE53280,0: "A mature language like C++ and proper smart pointer classes and container libraries easily outperform a .NET-like (or Java-like) GC, IMHO."

    This is probably *usually* true but is certainly not universally true. Reference counted GC has a number of its own problems (even beyond the whole "your program can screw it up by creating a cycle" thing), and it's not always a clear win performancewise. I think that "C memory management will *always* be faster than GC" is a common enough misconception that I'll expand upon it a bit even if you don't fall into that trap.

    A reference counted GC has to increment/decrement on every reference acquire or release — something the reachability-based garbage collector doesn't have to do. That's overhead. Now imagine you have a short-lived process: maybe the GC never needs to run. So the reachability-based GC has *no* overhead! (To be fair, systems that come with a reachability-based GC usually have a substantial startup cost, as well as bytecode interpretation/JIT, that will outweigh the reference counting management of, say, C++; I question how much this actually has to be the case though. It certainly makes a fair comparison of GC methods themselves harder.)

    Further, a compacting collector's cost is proportional only to the number (&size) of live objects, while a reference-counted GC's cost is proportional to the number of objects ever allocated. Then on top of that, if you have a compacting collector, the allocation of memory becomes *dirt simple*, whereas the best malloc() implementation for a workload varies. Add in a generational GC and the cost goes even lower. Put these together and, if your program's behavior is such that it's allocating and deallocating a lot of short-lived objects, the reachability-based GC may be faster. Read another way, some programs spend a lot of time in the GC, but some programs also spend a lot of time in malloc() and free(); if you've *got* such a program that spends a lot of time in malloc(), a compacting GC very well may speed it up.

    (Okay, in full disclosure, you can have a compacting or even generational GC that uses reference-counting. In such a case, you'd still have the per-object overhead of maintaining the reference counts, but you could use the dirt simple malloc() implementation. However, in the specific case of C++ smart pointers, this is basically impossible because you move objects because you can't identify pointers to them. You *need* language support for a compacting GC, and you need a compacting GC in order to get rid of the cost of malloc().)

    Finally, I strongly suspect that there's more room for improving a reachability-based GC than there is a reference-counting GC. There's probably not a lot you could do in order to make the reference count maintenance faster for instance, even if you have a separate thread that actually does the deallocation, I'm hard pressed to come up with something that would be faster than maintaining the reference counts sequentially. By contrast, you could run a moving GC in a different thread, and it may be possible to get the cost of *that* down to near-zero too. I don't think we're there yet, but there's still being a lot of research being done in making GCs faster.

  70. asf says:

    Not sure if this post inspired apenwarr.ca/log or not, but it sure spells out what I feel about GC

  71. SimonRev says:

    This has been a fascinating discussion.  Even though I fall in the C++/RAII camp, I do tend to think that proponents of C++ do tend throw out the performance card prematurely.  Modern GCs are pretty fascinating things.  When you consider the overheads for generic use of std::shared_ptr + new / delete I am not sure that it is faster than modern GC. Now, of course in C++ you have the option of lock-free memory pools, custom allocators, overloading new/delete and such if speed is paramount, but in practice very few people do that — and the complexity of doing so would make C++ more complex than anything in C# or even C.

    Personally I find the strongest arguments to be the ones that run along the lines of — The fact that C++ makes you always consider resource cleanup and the ability to ensure that every resource allocation is automatically ensured an automatic resource release makes it superior to C# where it is easy to forget you need to have a using statement.  In C# if you need and IDisposable object to be shared in many places, I have no idea how you establish a policy to ensure it is cleaned up at some point. (this last point is probably my lack of experience with the language, there is probably a good way to handle that)

  72. Joe says:

    <quote>The answer is to Close / Dispose whatever was loaded into the XmlDocument.</quote>

    That pretty much sums up the problem with garbage collection; after you open you have to remember to close. C has "after you malloc you have to remember to free". C++ has RAII, which solves the problem for all resources. C# has GC, which solves the problem for memory, but makes the problem more difficult to solve for everything else.

  73. Rob K says:

    What's being missed in a lot of this discussion, and as Raymond pointed out in a note "[All you folks who are suggesting that local variables should be auto-disposed are forgetting that you're working with references, not objects. It's like saying that in C++ all pointers should be auto-delete'd when they go out of scope. -Raymond]"

    He's exactly right. With garbage collected languages, you don't have objects on the stack, only references to them. All objects are always dynamically allocated. So how does C# know that you don't intend for that object to live beyond the life time of the current scope, and that you haven't passed the reference off somewhere else?

    In C++ on the other hand, you can have an object on the stack if you want. If an object is on the stack and has no destructors, all it takes to get rid of it is incrementing the stack pointer. It just goes away when the function exits. But in C# or Java for instance, you're forced to have all objects allocated dynamically, and they have to go through garbage collection.

  74. Joe says:

    @Stephen Cleary: For RAII, you need to be able to execute actions on creation, copying, assignment and going out of scope. To implement different storage semantics (e.g. shared_ptr) those actions also need to be able to vary independently of the type being stored.

    What 'using' gives you is not RAII – it's nothing more than syntax sugar for a try/finally/dispose.

  75. f0dder says:

    I'm aware that .NET doesn't follow stack-based semantics, but that doesn't mean that C# couldn't support a special method called when a variable goes out of scope.

    But yes, I'm also aware this has a number of problems – like what to do in case of exceptions, how to handle this 'destruction' when references are held elsewhere (wouldn't be pretty), et cetera.

  76. Csaboka says:

    C++/CLI can get away with auto-disposing because it has a different syntax for values and references. (Foo is always by value, Foo^ is always by managed reference.) For values, the compiler can be sure that the reference hasn't been leaked elsewhere, so it can be disposed at the end of the scope.

    For C# to have similar semantics, it would need a syntax that says "pretend this is a value type even though it's a reference type". This way, it would be obvious when you want the reference to get out of the method and when you intend it to stay local. Of course, doing this in a backwards-compatible way would be a major pain.

  77. POKE53280,0 says:

    @Evan and others who compared "deterministic" memory management vs. garbage collection:

    how does current GC technology scale up? I mean: would it be possible to write a medium-large software system (like e.g. Microsoft Office) using C# and GC technology?

    Of course C# and its GC (and VS IDE support for managed languages) offer better productivity to the programmer than C++ does; but I'm not sure if it is wise to invest in C# code when building a medium-large software system. What about performance? Would the app be responsive, snappy? …I just don't know.

    [Visual Studio 2010 is dominantly managed. And according to Rico, WPF is not the performance bottleneck. -Raymond]
  78. Stephen Cleary says:

    @Joe: What I mean is that IDisposable *enables* the RAII *design pattern*. You are correct that disposal is not guaranteed; you may be interested in a lengthly article on the subject here: http://www.codeproject.com/…/IDisposable.aspx

    Regarding the whole C++ and C# thing, some people have pointed out smart pointers such as shared_ptr which enable deterministic cleanup via reference counting, and have their own problems (circular references) mitigated by weak_ptr.

    However, I think that C++ has one advantage over C#: a nicer syntax for "using". This syntax, called "stack semantics for reference types" is equivalent to a "using" iff the object implements IDisposable. It would be nice to have something this clean in C#.

  79. Evan says:

    @Stephen Cleary: "What I mean is that IDisposable *enables* the RAII *design pattern*. You are correct that disposal is not guaranteed"

    See, I view guaranteed disposal as an *essential* part of RAII.

  80. Evan says:

    And just to support my assertion:

    * The article *you linked to* contains neither "RAII" nor "acquisition"

    * The Wikipedia article on RAII [1] lists several languages with a using-type statement, including C#, under the heading "Resource management without RAII"

    [1] en.wikipedia.org/…/Resource_Acquisition_Is_Initialization

  81. nobugz says:

    First box, shouldn't RAM be "virtual memory"? Or am I missing something?

    [Use whatever term you like, as long as you understand the underlying point. -Raymond]
  82. Martin says:

    The suggestion for setting the variable to null is probably a habbit from programming in languages such as VB classic which used reference counting. Objects are immediately cleared up when their reference count drops to 0 (often as soon as they go out of scope). If there are circular dependencies then setting the pointers to null is the only way to prevent a memory leak. This problem is obviously not the cause of this customer's problem however.

  83. Evan says:

    Personally I like the "set the variable to null" step just to indicate that there's some reason I don't expect that variable to be used any more, and not as any message to the system. (Of course, you don't do this at the end of a function or something like that, but if your function is 30 lines long and an object has outlived its utility on line 10, I think it's helpful to null it then.)

    @POKE53280,0:

    Most of the time, today's GCs don't prevent a program from being too snappy. Raymond mentions VS 2010, but you can also look at something like Eclipse. Sure, they aren't the snappiest things out there, and they each have some tasks that can take a little while (like starting Eclipse…), but both are entirely usable.

  84. Joe says:

    @POKE53280,0: It is difficult and unwsise to write a large scale software system in pure C#. That's why you find that all systems that appear to do so have small but significant sections written in native code – either mixed mode C++CLI, implemented using native code via PInvoke or by some other method.

  85. Daniel Earwicker says:

    Very thought provoking. I'm not convinced by Raymond's way of describing the goal of GC: "simulating a computer with an infinite amount of memory".

    That is an unachievable goal. Suppose in my class I have a static field of type List<object>. As my program runs, I add objects to this list. I can't be bothered to remove them – why should I have to? I have infinite memory! But no GC algorithm can decide on my behalf to ignore the references represented by my ever-growing list, so I will soon find out that I don't have infinite memory.

    So it is not true to say that "The rest is mechanism". The chosen mechanism of GC allows the program to forget about the problem of "leaked objects", but in return it requires the programmer to avoid "leaked references". It doesn't simulate infinite memory, and nor could any general mechanism. If the problem statement is unachievable, is it really a helpful way of characterising our goal?

    This calls into question your corollary that finalizers are not a way to clear up non-infinite resources acquired by the program. On the contrary, memory is just one kind of exhaustible resource, and finalizers are a way of opening up the GC mechanism so it can be used to reclaim other resources besides memory. It makes GC an extensible system in to which you could plug new resource types, effectively converting them into "managed resources" (watch how I casually throw together two hopelessly overloaded terms).

    The problem with finalizers is that they are way too flexible, so there are very many more ways of abusing them than using them correctly. Most of the small number of correct uses for finalizers have since been codified by the SafeHandle class, rendering finalizers largely obsolete as a programming interface.

    [The GC's goal is to simulate a computer with an infinite amount of memory. Sometimes the simulation breaks down. Nobody claimed that it was achievable with 100% success, but that doesn't invalidate the goal. -Raymond]
  86. Joshua says:

    @Evan: I have in my toolset a powerful construct that can be used to roundly trounce modern GC. It is called arena memory management and cannot be used from the managed world. As it trounces the best malloc()/free() money can buy it will trounce GC.

  87. POKE53280,0 says:

    @Joshua: I don't know if what you mean with "arena memory management" is somewhat equivalent to "pool allocator".

    I found that pool allocators are a powerful and high-performance technique for native C++ development.

    Raymond presented an interesting series on Chinese/English Dictionary, and – during the development of this series – there is a post dedicated to a string pool allocator technique here:

    blogs.msdn.com/…/420038.aspx

  88. Daniel Earwicker says:

    @Raymond – Absolutely, just as any piece of software's goal is to _simulate_ an intelligent friend who carries out your every whim without fail! (And when it fails, you get mad and throw rocks at it, just like you would with a friend.)

    But any rational programmer is going to want to know under what circumstances the simulation will break down. We don't want to "play dice with the universe". We want to know the rules we have to follow to get the behaviour we want. Hence we have to know about the mechanism.

    If we think GC is trying to give us infinite memory, then we don't know enough about GC to develop reliable software with it. If we know that objects are reclaimed when no references exist to them, then we're good to go (at least, we know enough to learn what constitutes a live reference, which you cover in subsequent articles).

    Given this, I don't think it's valid to draw your corollary about finalizers. We have to know more about GC than "infinite memory". We have to use it in such a way that we do not leak references. And if we do that, then finalizers *can* work as advertised, by reverting temporary state changes represented by "resource objects" when there are no further references to them. Finalizers *are* horrible, but they aren't fundamentally flawed in the way you suggest.

    [As with any simulation, you should know something about the mechanism so you don't subvert it, but my point is that in my experience too many people think the mechanism is the goal, when in fact the mechanism is only a mechanism. -Raymond]

Comments are closed.

Skip to main content