It Never Leaks But It Pours


/>

One
of the easiest bugs to write is the dreaded memory
leak
.  You allocate some chunk of
memory and never release it.  Those of
us who grew up writing application software might sometimes have a cavalier attitude
towards memory leaks — after all, the memory is going to be reclaimed when the process
heap is destroyed, right?  But those days
are long gone.  Applications now often
run for days, weeks or months on end.  Any
memory that is leaked will slowly consume all the memory in the system until it dies
horribly.  st1 ns = "urn:schemas-microsoft-com:office:smarttags" />Web
servers in particular are highly susceptible to leaks, as they run forever and move
a

LOT of memory around with each page request.

 

The
whole point of developing a garbage collected language is to decrease the burden on
the developer.  Because the underlying
infrastructure manages memory for you, you don’t have to worry about introducing leaks.  Of
course, that puts the burden squarely upon the developer of the underlying infrastructure,
ie, me.  As you might imagine, I’ve been
called in to debug a

LOT of memory leaks over the years.  The majority
turned out to be in poorly written third-party components, but a few turned out to
be in the script engines. 

 

I
mentioned a while back that ASP uses a technique called “thread pooling” to increase
its performance.  The idea is that you
maintain a pool of idle threads, and when you need work done, you grab one from the
pool.  This saves on the expense of creating
a new thread and destroying it when you’re done with it.  On
a web server where there may be millions of page requests, the expense of creating
a few million threads is non-trivial.  (Also,
this ensures that you can keep a lid on the number of requests handled by one server
— if the server starts getting overloaded, just stop handing out threads to service
requests.)

 

I
think I also mentioned a while back that JScript has a per-thread garbage collector.  That
is, if you create two engines on the same thread, they actually share a garbage collector.  When
one of those engines runs a GC, effectively they all get collected.

 

What
do these things have to do with each other?  Well,
as it turns out, there is a memory leak
that we have just discovered in the JScript engine.  A
small data structure associated with the garbage
collector
is never freed when the thread goes away.  What
incredible irony!  The very tool we designed
to prevent your memory leaks is leaking memory.

 

It
gets worse.  As it turns out, this leak
has been in the product for years.  Why
did we never notice it?  Because it is
a per-thread leak, and ASP uses thread
pooling!  Sure, the memory leaks, but
only once per thread, and ASP creates a small number of threads, so they never noticed
the leak. 

 

So
why am I telling you this?  Because for
some reason, it never rains but it pours.  We
are suddenly getting a considerable number of people reporting this leak to us.  Apparently,
all of a sudden there are third parties developing massively multi-threaded applications
that continually create and destroy threads with script engines. Bizarre, but true.
They are all leaking a few dozen bytes per thread, and a few hundred thousand threads
in, that adds up. 

 

I
have no idea what has led to this sudden uptick in multithreaded script hosts.  But
if you’re writing one, let me tell you two things:

 

1)      We’re
aware of the problem.  Top minds on the
Sustaining Engineering Team are looking at it, and I hope that we can have a patched
version for a future service release.

2)      Use
thread pooling!  Not only will it effectively
eliminate this leak, it will make your lives easier in the long run, believe me.

 

This
whole thing reminds me that I want to spend some time discussing some of the pitfalls
we’ve discovered in performance tuning multi-threaded applications.  But
that will have to wait for another entry.

 

 

Comments (5)

  1. Dan Shappir says:

    While it’s quit correct that server apps are likely to run for much longer than client apps, it’s not a prerequisite that they need to run forever in order provide round-the-clock service.

    As I recall, the Netscape web server used to periodically restart themselves, that is stop their own process and cause it to restart in order to leverage the OS for cleanup purposes.

    In one of our own server products we have a watch-dog process, made as simply as possible that monitors the main process. If the main process goes down unexpectedly, the watch-dog process restarts it. Likewise, if the main process is about to exhaust system resources (due to a leak) it’s also stopped, and the watch dog starts it again.

    Obviously, if you aren’t using a multi-server system, such a restart can result in lost client connections, but even then it’s better than a server crash.

  2. Eric Lippert says:

    Indeed, ASP has a similar architecture now. You can configure ASP to periodically recycle itself, and it will come back from the dead should some ill-behaved third party control hork the heap or something.

  3. RJ says:

    GC is great, unfortunately if like .Net 1.0 it has a bug, it’s not so great. After seeing some problems on a customer site to find out about the large heap bug in the .Net GC was horrendous.

  4. Ben Wilhelm says:

    A year or two ago I was really concerned with memory leaks – oddly enough, my current set of projects has made memory leaks completely unimportant.

    See, I’m working at a game company, and we have two major programs that we’re writing. One of them, obviously, is the game. However, it’s a *console* game. We’ve got 32mb of RAM total – obviously we can’t afford to leak, but we can’t actually afford to allocate either. We’ve got a "heap" that contains a grand total of 20 unique allocations at most, text labels on all of them, and a button we can push to get a list – leaks are trivially easy to fix because they tend to crash the game after just a few level changes (the only time when allocations or deallocations occur.)

    Our other major program is our build tool . . . which is a command-line tool that runs on our Linux server. We don’t care if it leaks. It’ll be quitting in a few minutes anyway, and all the leaks go away magically. I had one case where I needed to deallocate a chunk of memory – unfortunately, some legacy code allocated it with malloc() and some allocated it with new, so there was no way I could do it. So I didn’t. Works fine.

    It’s very strange having dealt with memory leaks for years, and then all of a sudden not have to worry about them at all.

  5. David Butcher says:

    In the post it mentions that a patched version may be released for a future service release. Did this patch release appear?

    I am investigating memory leaks of the kind described above – the version of Jscript.dll on the target machines is 5.6.0.8820.  I know this post is from 5 years ago – however with our deployed systems it will be easier to work with the target machine configuration than with the massively multi-threaded application – so if there was a 5.6.0.x patch release that addressed the problems then that would help a lot.