Get most duplicated strings from a heap dump using ClrMD


ClrMD is an awesome managed API to inspect managed processes and dump files. To use it, just add a NuGet reference to Microsoft.Diagnostics.Runtime. When loading a dump, be sure to have the mscordacwks.dll from the machine where the dump was taken. Also make sure that your program that uses ClrMD is the same platform (32/64-bit) as the process/dump that you’re inspecting.

Here’s a sample of getting most duplicated strings out of a dump, an indication that you might need a string cache somewhere. Remember that if creating a dump of a 32-bit process on a 64-bit OS you need to use the 32-bit Task Manager, otherwise the dump will be useless.

using System;
using System.Collections.Generic;
using System.Linq;
using Microsoft.Diagnostics.Runtime;
 
namespace DumpTools
{
    class DumpHeapStrings
    {
        static void Main(string[] args)
        {
            using (var dataTarget = DataTarget.LoadCrashDump(@"app.dmp"))
            {
                var dacLocation = dataTarget.ClrVersions[0].TryGetDacLocation();
                var runtime = dataTarget.CreateRuntime(dacLocation);
                var heap = runtime.GetHeap();
                var objects = heap.EnumerateObjects();
 
                var stringUsages = new Dictionary<string, long>();
 
                foreach (var instance in objects.Take(1000000))
                {
                    var type = heap.GetObjectType(instance);
                    if (type != null && type.IsString)
                    {
                        var size = type.GetSize(instance);
                        var value = (string)type.GetValue(instance);
                        long usages = 0L;
                        if (stringUsages.TryGetValue(value, out usages))
                        {
                            stringUsages[value] = usages + 1;
                        }
                        else
                        {
                            stringUsages[value] = 1;
                        }
                    }
                }
 
                var sorted = stringUsages.OrderByDescending(kvp => kvp.Value).Take(100);
                foreach (var kvp in sorted)
                {
                    Console.WriteLine(kvp.Value + "\t\t" + kvp.Key);
                }
            }
        }
    }
}

Comments (1)

  1. Mark says:

    I just wrote basically this exact same code. I came across your post trying to figure out how to locate the appropriate Dac, which was giving me problems trying to analyze dumps from different machines, so thanks for that.
    It is laughable how much memory this application is wasting on duplicate strings. Opportunity for improvement I guess!

Skip to main content