(Charles – n) + (George + n) != Charles + George

Or to rephrase the title, in the land of parallel processing you can rob Peter, pay Paul, and have everybody end up richer.

I once did some consulting for a game that was having performance problems. It used a sophisticated visibility system which split the environment up into many small sectors, then tested each piece against the view frustum and used some clever error tolerance heuristics to choose a dynamic level of detail per sector. This resulted in the minimum possible number of triangles being sent to the GPU each frame.

Trouble is, this game was CPU bound!

I recommended they remove the visibility system, merge the entire environment into a single huge mesh, and always just draw the whole thing. This improved the framerate.

These were smart programmers. They had profiled their game, and seen most of their time spent in the environment drawing code, so they concluded it was a good thing they’d already optimized that code, and wondered how much worse things could have been if they didn’t have such a good visibility system.

Their mistake was not understanding parallelism. The visibility system was saving GPU cycles (by drawing fewer triangles) at the cost of increased CPU cycles (first to compute the visibility, and then to draw many small sectors as opposed to one big mesh, which caused lots of driver translation work). They had optimized for the wrong processor, and made their game slower as a result.

(ok, I admit it: this was my game and my mistake. I did figure it out in the end though 🙂

Comments (5)

  1. Ultrahead says:

    Hey! Why not rob Paul to pay Peter 🙂

  2. AndrewVos says:

    What game are you talking about, if you don’t mind me asking?

    I would like some idea of the size of the mesh you’re talking about.

  3. > Hey! Why not rob Paul to pay Peter 🙂

    That would work too, as long as your game was Peter-bound 🙂

    For instance it is usually faster to do character skinning in the vertex shader, but if your game is GPU bound and bottlenecked by vertex processing, it might turn out faster to move that work back onto the CPU.

  4. > What game are you talking about, if you don’t mind me asking?

    It was an internal thing, so I’m not sure how much I can talk about the details in public.

    > I would like some idea of the size of the mesh you’re talking about.

    I had maybe 100,000 triangles in the environment.

    Of course, I’m not saying you should never both with environment culling or visibility. My point is just that things can be pretty counterintuitive depending on the specifics of your app and hardware setup, and what seems like a surefire optimization might not always be depending on where your bottleneck turns out to be!

  5. AndrewVos says:

    Thanks for your response. By environment do you mean just terrain, or all "non-movable" objects?

    P.S. Why doesn’t msdn blogs notify me of the reply? It took me twenty minutes to find this post 🙁

Skip to main content