Or to rephrase the title, in the land of parallel processing you can rob Peter, pay Paul, and have everybody end up richer.
I once did some consulting for a game that was having performance problems. It used a sophisticated visibility system which split the environment up into many small sectors, then tested each piece against the view frustum and used some clever error tolerance heuristics to choose a dynamic level of detail per sector. This resulted in the minimum possible number of triangles being sent to the GPU each frame.
Trouble is, this game was CPU bound!
I recommended they remove the visibility system, merge the entire environment into a single huge mesh, and always just draw the whole thing. This improved the framerate.
These were smart programmers. They had profiled their game, and seen most of their time spent in the environment drawing code, so they concluded it was a good thing they'd already optimized that code, and wondered how much worse things could have been if they didn't have such a good visibility system.
Their mistake was not understanding parallelism. The visibility system was saving GPU cycles (by drawing fewer triangles) at the cost of increased CPU cycles (first to compute the visibility, and then to draw many small sectors as opposed to one big mesh, which caused lots of driver translation work). They had optimized for the wrong processor, and made their game slower as a result.
(ok, I admit it: this was my game and my mistake. I did figure it out in the end though 🙂