StreamWriter Buffered Data Lost MDA (or a cute finalizer trick) [Brian Grunkemeyer]

A somewhat common problem when getting started with developing managed code and using our IO package is forgetting to close a Stream or a StreamWriter.  Users who write code like this will be disappointed:

void Foo() { StreamWriter sw = new StreamWriter("file.txt"); sw.WriteLine("Data"); // Forgot to close the StreamWriter. }

In this example, the data in the StreamWriter is never written to the underlying stream. 

Background

StreamWriter internally buffers data in an attempt at reducing the aggregate amount of work needed to write out data, and FileStream does the exact same thing.  So for correctness, the StreamWriter (and FileStream) must be closed explicitly by the user.  While we can rely on finalization to ensure that the FileStream's handle is eventually closed, and we can probably ensure that any buffered data in a FileStream has been flushed to disk (even with SafeHandle in Whidbey, using the very weak ordering we added to critical finalization explicitly to solve this problem), we cannot ensure that the StreamWriter's buffer is written to disk.  The reason is that (normal) finalizers aren't ordered - any two objects may be finalized in any order, or at the same time if we add multiple finalizer threads in a future version. 

The explanation of this problem has made it into a few different forums, including Jeffrey Richter's Applied Microsoft .NET Framework Programming, on pages 484-485.  (Jeff chose to use BinaryWriter here instead of StreamWriter, but he's discussing the same issue.  However, it's not relevant with BinaryWriter because it doesn't have an internal buffer in our current implementation.  I'll ask Jeff to fix that for his next edition.)

Detecting Data Loss & Notifying the Developer

In any event, users who make the above mistake don't get any data written to their file, and they don't get any indication that they lost data by not closing the StreamWriter.  I'm investigating a change to fix that for Whidbey Beta 2.  We can detect this by adding a finalizer to StreamWriter whose sole purpose is to check for buffered data, and if found, then report an error.  We've added something to the product called Managed Debugging Assistants (MDA's) in this version, and while they're not the easiest thing to turn on right now, they should be well-integrated with Visual Studio sometime before we release the product.

When enabled, this MDA will display some message roughly like this:

A StreamWriter wasn't closed and all buffered data within that StreamWriter wasn't flushed to the underlying stream. (This was detected when the StreamWriter was finalized with data in its buffer.) A portion of your data is lost. Consider one of calling Close(), Flush(), setting the StreamWriter's AutoFlush property to true, or allocating the StreamWriter with a "using"statement to ensure your StreamWriter is properly cleaned up. Stream type: System.IO.FileStream

File name: C:\Test\IO\StreamWriterBufferLostMDA\junk.tmp

Allocated from:

at System.IO.StreamWriter.Init(Stream stream, Encoding encoding, Int32 bufferSize)

at System.IO.StreamWriter..ctor(String path, Boolean append, Encoding encoding, Int32 bufferSize)

at System.IO.StreamWriter..ctor(String path)

at LosesData.Main()

Note the addition of a stack trace here, showing you where the StreamWriter is allocated. In a large application, knowing where you allocated & leaked one of the several StreamWriters you use is very useful, so you can easily find which code needs to be fixed. In the example below, this was allocated from LosesData::Main(), which was my simple test case to demonstrate this problem.

MDA's are interesting because they can be disabled & enabled via settings in a debugger, or in a config file. The exact details how to enable this (via an entry in a file called foo.mda.config?) or when this will be enabled (ie, only when you have a managed debugger attached, or if any debugger is attached?) are still being decided, so this may not show up exactly like this in Beta 2 or our final Whidbey bits. But hopefully this gives you an idea of some ways we're trying to help people become more productive by helping them find their problems more quickly, while not seriously penalizing working code.

How to Clean Up a StreamWriter

There are a few ways of fixing this problem in your code, whether you've relied on the MDA to track it down, or you've noticed that your file is missing up to 4K worth of data.

Use the using statement in C# & VB. In managed C++, use a try/finally to call Dispose.

void Foo() {
using(StreamWriter sw = new StreamWriter("file.txt")) {
sw.WriteLine("Data");
}
}

Or you can use the long form, expanding out the using clause:
void Foo() { StreamWriter sw; try { sw = new StreamWriter("file.txt")); sw.WriteLine("Data"); } finally { if (sw != null) sw.Close(); } }

If neither of these solutions can be used (say, if you have a StreamWriter stored in a static variable and thus you cannot easily run code at the end of its lifetime), then calling Flush on the StreamWriter after its last use or setting its AutoFlush property to true before its first use will be sufficient. Here's an example:

internal static class Foo {
private static StreamWriter _log;

static Foo() { // Static class constructor
StreamWriter sw = new StreamWriter("log.txt");
sw.AutoFlush = true;
// Now publish the StreamWriter for other threads.
_log = sw;
}
}

Other finalization tricks

You can play other tricks with finalizers as well. I briefly added code to Object's finalizer that flagged any objects using IDisposable that didn't get disposed (the StreamWriter case here is really a subset of a much broader problem - improper resource cleanup). However, we didn't like adding the finalizer to Object because its existence could hurt performance in retail builds, and it might hinder a few future optimizations we wanted to make. The error detection was also somewhat noisy - it found a lot of issues, but not all of them were really bugs that need to be fixed. But perhaps there's something here of merit that's worth revisiting...

In any case, you could do this in debug builds of your own by adding in your own base class or set of base classes for your own types. While you could take the route of defining a MyProjectObject, that's probably not really a good idea. Instead, look at any base classes you might own - they're probably natural places for this type of error tracking in debug builds. If I couldn't change the Object class, Stream might be a good runner up, for example. And the best part is you can do these changes to your code in Everett - you don't have to wait for us to design an MDA reporting infrastructure, then invent some useful individual MDA's to find interesting features like this.

One annoyance with the finalizer approach that I ran into was that if the app simply quit, the finalizer either wasn't running, or it took so long to run that the CLR gave up & exited (we were spinning up a lot of code during the finalizer in this trivial test case, and we want to shut down within ~2 seconds of returning from main), or MDA's are disabled during process shutdown. I don't know which of these cases I was running into. But it was easy to fix by adding a call to GC.Collect() then GC.WaitForPendingFinalizers() to the end of Main.