Managed Code Performance on Xbox 360 for XNA: Part 2 - GC and Tools

...continuation of Part 1, it can be found here

 

Memory and Garbage Collection

One common concern for game developers is the garbage collector. By design, GCs trade off determinism for convenience. Luckily, keeping the GC predictable is fairly straightforward. Two variables to pay attention to are:

  1. How long a GC is taking (aka. GC latency)
  2. When and how often a GC happens

You’ll want to strike a balance between “how long” and “how often” to get smooth game play. Even if your GCs are few and far in between such that your average framerate is unaffected, a single GC that takes long enough will cause a perceivable skip. That’s pretty obvious… so how long you ask? We observed that you see a small skip at around 25-30ms of latency on top of the running frame rate. That is under the assumption that a GC is happening just about once per second. Which leads in to the second factor, lots of regular GCs will kill your average framerate with unnecessary overhead. One thought that occurs to people is: “Why not call GC.Collect() every frame so it’s deterministic!”. The wasted overhead you get for “over collecting” typically doesn’t make sense.

 

At 60fps each frame takes about 16.67ms (that’s 1000ms/60frames). Isolated benchmarks show that a GC of 100,000 live objects takes about 14ms (though that’s a relatively large number of objects compared to Rocket Commander by Benjamin Nitschke which has just under 50,000 live objects at peak). Would you really want to sacrifice 80% of your game forcing GCs (~14ms/17ms)? Don’t think so. One disclaimer, the referenced 14ms benchmark is “isolated” and doesn’t account for the cost of compaction and collection which are additional real world costs depending on heap fragmentation and general object churn.

So how does one control GC latency? Like NetCF for devices, the Xbox GC is non-generational. That means every collection is a full collection on the managed heap. Thus, we find that GC latency is approximately linear to the number of live objects… then add the cost of heap compaction on to that. Our benchmarks show that the difference between deep object hierarchies vs. shallow ones is negligible, so it’s mostly the number of objects that matter. Small objects also tend to be a somewhat cheaper to deal with than big objects.

 

The triggers that cause collection are unchanged from v2.0 as well. A collection is triggered for every 1MB allocated or if an allocation fails from an out-of-memory exception.

 

Games typically have lots of small objects that represent game state. The obvious optimization here is to reduce live object count. You can do that by defining those data structures as structs which are value types (to use more general terminology). Value types stay off the GC heap... of course that assumes that your structs don’t get boxed in to objects, which can often happen unknowingly in your code.

 

A concise blog posting about value type boxing can be found here: https://blogs.msdn.com/scottholden/archive/2005/01/27/362084.aspx. An obvious boxing is scenario is, casting a value type to an object to call an object method. An insidious boxing scenario is passing a value type to a method that takes an object type (eg. ArrayList.Add(object)). Generics help you avoid this latter scenario, but can also lead to other non-obvious boxing scenarios. You can use the XNA Remote Performance Monitor Tool discussed below to watch out for boxing. Read more about the generics implementation in NetCF here: https://blogs.msdn.com/romanbat/archive/2005/01/06/348114.aspx.

One more caveat on structs that was touched on in the floating point section: avoid passing and returning large value types by value, and instead use “ref” or “out”. I’ll emphasize again that passing and returning large value types like Matrix is particularly costly in NetCF. Of course, make sure to understand the semantics of ref/out here:  https://msdn2.microsoft.com/en-us/library/8f1hz171.aspx

 

The second obvious optimization is to reduce the number of allocations over the lifetime of your game. You can do that by pooling data structures and reusing them rather than repeatedly “newing” up new instances. There are many benefits here ranging from saving on the cost of “new” to theoretically reducing the need to compact the heap because it isn’t churning all the time.

 

Combine both tactics together: recycle structs in constant size array pools while avoiding boxing and that should mitigate any GC perf worries! We had a reasonably complex game (well visually complex at least) do all of the above and it’s GC latency averaged 6ms so it was never an issue.

Tools

The Remote Performance Monitor from 2.0 SP1 is back with a slightly upgraded UI as the XNA Remote Performance Monitor for Xbox 360! It’s easy to set up and connect and will give you basic but valuable performance counter data. Leading in from the last section, the GC counters will probably be the most valuable. Notice the fields: “Garbage Collections”, “GC Compactions”, “Boxed Value Types”, “GC Latency Time”.

 

 

Watch the numbers tick and try to keep them from ticking up based on the tips above. Yep. That’s all.

 

Oh yeah, and make sure to measure and profile your code before you start optimizing... System.Diagnostics.Stopwatch. (You can't have a blog about perf without mentioning that) 

 

Look forward to seeing games that push the limits of the system so we know where to optimize next!