Debugging a GDI Resource Leak

One of the most common graphics-related problems we see is a “GDI leak” (or, simply the usage of too many GDI objects), which will eventually cause rendering problems, errors, and/or performance problems. Some things need to be defined and explained before I talk about how we debug these problems.

 

  --Definitions--

GDI Objects are resources that are managed by GDI32.DLL on behalf of an application. Some of the most common types of GDI objects are Device Contexts (DCs), Bitmaps, Brushes, Fonts, Metafiles, Pens, and Regions.  GDI Objects are stored in Kernel Memory (specifically the Paged Pool or Session Pool portions of kernel memory – more on this later).

GDI Handles are unique identifiers of a GDI Object.  Each GDI Object can have only one handle.  Each GDI Handle is process-specific (cannot be used by other processes).

The GDI Handle Table is a table (array) of GDI entries.  Each entry contains 32-bits of information about a GDI object, including the handle, the type of the object (i.e. bitmap, DC, pen, font, etc.), the process for which that handle is valid, and a pointer to the actual GDI object in Kernel Memory. This table exists in User Memory.

Note: the fact that it contains the type of the object is very useful to us, and this can be used to help track down what we’re leaking and why – more on this later.

 

--Creating GDI objects (and getting back a handle)--

To create a GDI object, we call the appropriate GDI API function (usually of the form CreateXXX, such as CreateFontIndirect), which will create the object in kernel memory, and if successful, add an entry in the GDI Handle Table, and return a handle to that object. 

But what if we run out of handles?  What if there isn’t enough room left in kernel memory for the object?  In these cases, the GDI function will fail, and GetLastError will usually return ERROR_INVALID_HANDLE.

 

--Limits--

There is a limit of 65,536 (64k) GDI handles per session.  Note though that, particularly on 32-bit (x86) systems, the effective maximum is usually lower than this, due to memory limitations.  For example, you generally won’t fit anywhere near 65,536 large bitmaps on a 32-bit system, no matter how many handles you’re allowed to have.

That 65,536 handles needs to accommodate all processes in the session (your app doesn’t get to use them all).  In fact, to prevent a single process from bringing the entire system to its knees (and making the OS unable even to draw a dialog saying “I…CAN’T…DRAW…ANYMORE”), there is a per-process limit, which by default is 10,000 GDI handles.  Although there is rarely, if ever, any good reason for any process to need more than 10k unique GDI objects in memory at the same time, this limit can be raised via the following registry value…

 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\WindowsNT\CurrentVersion\Windows\GDIProcessHandleQuota

If an application is running out of GDI handles (i.e. you hit 10k handles and can’t create any more), then raising the above limit is NOT the suggested way of solving the problem. 

First of all, raising this limit does not raise the available kernel memory to store the actual GDI objects that the handles identify.

Second of all, raising this limit REDUCES the number of GDI objects that the rest of the system (including Windows itself) can use.

Thirdly, and maybe most importantly, the application should be redesigned to not use so many GDI resources.  The right thing to do is to manage resources more efficiently in the application.

 

--Debugging--

Okay, so what do we do if we suspect that we are using too many GDI objects?

 

1 – Are we really leaking (or using too many) handles? If so, how many and how fast?  

  • Check GDI Object usage via Task Manager (note, you need to specify the GDI Objects column – shown below).

 
  

  • Generally, if you see a process with a GDI Object count in the thousands, then there is a potential problem.
  • If you see a process with a GDI Object count of near 9,999, then it is very probably out of GDI resources and cannot create more.
  • When a process (or the system) is out of resources, the creation API (i.e. CreateFont, CreateDC, etc.) will fail and set the last error (GetLastError) to 5 (ERROR_INVALID_HANDLE).

 

2 – What types of GDI objects are these? Fonts? Brushes? Bitmaps? etc.  

  • Find out what types of GDI objects are being leaked (if not already known).  This is harder than it used to be, since most of the tools out there that query this information have fallen out of date with Vista and later.  For reference, and as a starting point if you want to try to get them working…
    • GDIObj from <www.fengyuan.com/download.html>
    • GDILeaks from msdn.microsoft.com/en-us/magazine/cc188782.aspx
    • There are more, or you could write your own and use the above as reference (see below).You can do this yourself, but it involves some tricks.  The most important thing you’ll need is the address of the GDI handle table in User Memory.  Though undocumented, this can be obtained by looking for GdiSharedHandleTable inside the process environment block (the PEB).  This field contains the address for the handle table. 
  • Under WinDbg, once you have public symbols setup, run dt ntdll!_PEB.  This will tell you where in the PEB you can find GdiSharedHandleTable.  Note the location can vary, and will vary between x64 and x86.
  • Once you have this location, you can look through the entries in memory.  It will take some trial and error since the exact layout is undocumented and unsupported.  Hint: You could try to use one of the previously linked (older) tools as a starting point. 
  • Once you know what type of objects you’re leaking, look for where you are creating these object types in your code (i.e. search for CreateFont and CreateFontIndirect if you are using thousands of font handles), and make sure you are de-selecting and deleting these objects whenever you are done with them.
    • If you think you really need thousands of objects, ask yourself – are these really all unique objects?  Or am I creating hundreds of instances of a 12 point Arial font?  If the latter, then try caching a single instance for use by the different parts of your code (for example, create a single 12-point Arial font when your app starts up, and save the handle off).  Just use the same handle as needed in your code.  Delete it with DeleteObject when you’re really done with it (maybe not until your app closes).
    • On the other hand, if you really need thousands of unique objects, try to create/select/use/de-select/delete these objects each time you need them.

 

3 – If there isn’t a clear problem handle leak found via 1 and 2…

  • Check overall Paged Pool usage.
  • This can be quickly checked with Task Manager (notice the “Paged” field in the Performance tab of Task Manager shown here)…

 

 

  • Even better, check the Paged Pool usage with Process Explorer.
    • First, you need to set up symbols via Options->Configure Symbols.  Enter a symbol path of SRV**msdl.microsoft.com/download/symbols
    • Then,       click View->System Information…
  • Note, Process Explorer also shows the limit (when symbols are configured correctly as above) - which is information that Task Manager does not give…

 

 

4 - If #3 shows that you are running out of Paged Pool…

 

  • First, a bit about which “Paged Pool” we care about (there are really two kinds).  For systems that don’t have Terminal Services installed, we’re talking about general paged pool; on all other systems, we mean the per-session session pool.
  • If running XP or earlier versions of Windows, then, you would need to explicitly “tell” the system that you want pool allocations to be tagged.  On these older systems, run gflags.exe (along with WinDbg, this is included with Debugging Tools for Windows), and select the “Enable pool tagging” checkbox. Then, reboot (again, XP and earlier only).

 

  • Poolmon will tell you what objects are using up Paged Pool (it will also show you the overall usage like Task Manager and Process Explorer did)…

  • To narrow down what PoolMon shows you, you can start it as follows (for example)…
    • poolmon –gla*
  • You can also tell it to sort by various fields and turn off highlighting (press ? for help and ESC to get out of help).
  • pooltag.txt ships with the Windows Driver Kit (WDK) and lists all the tags, but we only care about a few…
  • The tags that GDI use all start with ‘G’, making this usage more easy to track (not only in poolmon, but when debugging - for example, under WinDbg, you could run !poolused 4 Gla: ).
    • Some example tags are…
    • “Gla:” Font type is 0xa.
    • “Gla1” DC type is 0x1.
    • “Gh05” SURF (bitmap) type is 0x5.
  • To give you a rough idea of memory usage seen through experimentation (these are under XP):
    • DCs (Gla1) – 1600 bytes per object
    • Fonts (Gla:) – 656 bytes per object
    • Bitmaps (Gla5) – 392 bytes per object
    • Regions (Gla4) – 176 bytes per object
    • Palettes (Gla8) – 160 bytes per object
    • Brushes (Gla@) – 112 bytes per object
  • Tracking down a pagedpool leak is often more art than science, but a few tools to help you figure out what is leaking are:
    • Under WinDbg: !poolused 4
    • Under WinDbg: !vm 1
    • poolmon (see above)
  • Note – to track Session Paged Pool, use poolmon /s, or poolmon /s# (where # is a session ID).  You can toggle between system and session by typing s while poolmon is running.

 

5 – Once you know what is being leaked…

  • Review your code.  Make sure you are cleaning up after yourself with DeleteObject.  This is something you should do before even going down this road!
  • If you can’t find the culprit, try adding some trace information next to your CreateXXX and DeleteObject calls, and then examine the output after running.
  • Put a breakpoint on the CreateXXX function (where the actual function name depends on the type of object – e.g. ExtCreateBrush), and try to see who the guilty party is.

 

  • NOTE - Often, if there is a leak of this kind, the function is being hit hundreds or thousands of times, and not every one will be part of the actual leak (often, most of them are not!).  Just because you’re leaking fonts, doesn’t mean *this* CreateFontIndirect call is part of the problem.

 

Some common mistakes…

  • In .NET, not explicitly calling Dispose on all System.Drawing objects before they go out of scope (the using keyword is also acceptable).
  • Not calling DeleteObject on each created GDI object in native (non-.NET) code.
  • Creating lots of memory intensive objects (for example large bitmaps).  Particularly on 32-bit systems.  Also, particularly in a terminal services session, where the available session paged pool is much lower.