I recently bought a new house, and along the way our agent recommended we get a sewer inspection. Those things are so cool! The guy arrived with a roll of tubing attached to a gizmo that looked like a prop from City of Lost Children, unwound this into the sewer, and got a realtime video feed back to his laptop.
The bad news is he found a cave-in just past the south wall (but don’t worry, it won’t cost too much to fix). In fact the inspector suspected this even before we saw it on the video. He had a feeling something was wrong because he noticed a depression in the lawn.
But here’s the thing: even though he was able to diagnose the problem by applying a mixture of experience and intuition, we still sent down the camera to confirm his guess before shelling out the $$$ to buy the house. Guesses are no substitute for hard data, especially when large sums of money are involved!
Software optimization works the same way. Intuition can be a useful starting point, but if you really care to understand what’s going on, there’s nothing quite like looking inside the sewer and seeing for yourself. And yet, I’ve lost count of how many times I’ve seen variants on the following conversation:
Programmer: Can someone tell me how to speed up my Frobnicate method?
Expert: How do you know Frobnicate is the bottleneck? Have you profiled this program?
Programmer: Well no, but I’m pretty sure; I mean it’s got to be, right?
Expert: Do you even have a profiler?
Programmer: Well, not exactly…
Before you can hope to speed up a program, you have to know what part of the code is making it slow. The problem with guessing is that if you guess wrong, you can waste a lot of time trying to speed up something that wasn’t slow in the first place.
People are often tempted to add timing instructions to their code, using Stopwatch to record how long various sections take to run. There is indeed a time and place for this technique (I will blog more about it later), but it is a blunt instrument and prone to error. Common problems:
- Can take a long time to hone in on the slowest pieces of code
- The results are only as good as the places you think to add instrumentation
- If you don’t think to add a timer around the slowest method, you may never even notice it
- No good for spotting those "duh" things that ought to be fast, but are actually slow because of a silly mistake
- The timing code itself takes some time to run, which may throw off the results
- Easy to make mistakes in your timing code, in which case the results are worse than useless
A better solution is to use a profiling tool. Profilers are to game programmers what spirit levels are to carpenters. Sure, you can do good work without one, but at some point you get fed up of eyeballing shelf after shelf ("do you mind checking if this looks straight while I hold it in place? Ok, down a bit on the left, sorry, I meant the other way, ok, that looks good; wait, no, it seems wonky now I’m standing over here…") and you realize it is worth a trip to Home Depot to spend $15 for the proper tool.
Ok, now the bad news: there is no single tool that will tell you everything in one place. To truly understand performance, you must combine data from several tools. Here are the techniques I use most often, starting with the most important:
- Use a sampling profiler to see where your CPU time is being spent. There are many to choose from: ANTS, NProf, VSTS, VTune, and more that I forget right now. Never leave home without one! Unfortunately there are no such profilers for Xbox or Zune, but you can still get useful information by profiling a Windows version of your game, as the hotspots are usually similar on all platforms.
- Make sure your profiler isn’t being tricked by indirect garbage collection costs. Garbage collection is sneaky because it is a "play now, pay later" feature. Memory allocations are very fast in .NET, but if you do too many of them the garbage collector will kick in at some later time, and that can be expensive. A sampling profiler is no use for diagnosing this, because it can only tell you where the CPU is spending its time, and does not understand enough to know what earlier code is causing collections to occur. The solution is to use Remote Performance Monitor to see how long GC is taking on Xbox, then if it turns out to be a problem, use CLR Profiler on Windows to see exactly where the garbage is coming from, and optimize as necessary. Even if the Windows version of your game already runs fast enough, using CLR Profiler on the Windows version is the best way to understand what is causing GC on Xbox.
Make sure your profiler isn’t being tricked by graphics rendering costs. Graphics is another "play now, pay later" feature which can lead to surprising profiler results. You need to understand how the CPU and GPU run asynchronously from each other, then work out whether you are CPU or GPU bound. If GPU bound, narrow down what part of the GPU is your bottleneck and try some performance experiments. Also watch out for pipeline stalls.
- Add a framerate counter. Use a special profiling project configuration that selects variable timestep and disables vsync, and launch via Ctrl+F5 to keep the debugger out of the way. Now you can check the framerate whenever you add new code, so you will notice if something slows down more than you were expecting. The sooner you spot a problem, the easier it is to fix, as you can fire up the profiler while the new code is still fresh in your mind.
- If you are trying to choose between design alternatives and want to understand the performance characteristics of various options ("which is faster: a virtual method or an interface?"), consider microbenchmarking specific language or runtime features in isolation. I will write more on this later.
- People are often surprised when I list .NET Reflector as a performance tool, but this is invaluable when you want to understand the performance characteristics of library code that you rely on. Pop quiz: does List<T>.Sort generate garbage?