Back of the envelope


Everyone who writes software, and most who use it, should be familiar with "back of the envelope" calculations. I originally read about this style of thinking about problems in Jon Bentleys' Programming Pearls, Chapter 7


Part of the work we are doing in core is around rendering and performance. We are working to update our rendering abstraction, how we view rendering in general; as well as how we think about and measure performance.


As part of this we have started doing some calculations related to bus bandwidth and how much of the bus each subsystem can and should use. This is really our first attempt at these “back of the envelope” calculations and we should have done them long ago.


Anyway, enough with the recriminations and on to the data. First, here are the bus bandwidth rates we are using:


Bus                               Expected Perf Rate  (GB/s)                 Expected Rate ( MB/Frame at 60 FPS )                                                  Date introduced


PCIe 16x 2.0     4                           68                                2007


PCIe 16x 1.0     2                           34                                2004


PCIe 8x  1.0     1                           17                                2004


AGP  8x  1.0     1                           17                                2002


 


Ok, so let’s think about this and just the Autogen tree subsystem.  In FSX, most auto generated trees have 12 vertices that contain 32 bytes of data[1] thus giving the model a size of (12 * 32 bytes) = 384 bytes per instance.


When using batching, assuming 20% of the bus bandwidth is available for auto generated trees, it is possible to transfer (0.20 * 34 MB / 384 bytes per instance) = 18568 batched trees per frame at 60 Hz.


Now, a typical scene has 50 1km x 1km cells in the scene. Autogen trees are pegged at 4500 max per cell (when the slider is all the way to the right). You can set the max up to 6000, so let’s call it 5000 for easier math. 5000*50=250,000 trees.  


Holy Autogen Batman! Yes, this is why autogen brings the system to its knees. Crysis doesn’t try to render 250k trees. If you are having a major perf problem with Autogen, try using the “max in cell” tweak to reduce the max to 500. Then your max is 500*50=25,000.


It should be obvious the same holds true for Autogen buildings.


Part of our performance work is to turn the engine from a batching engine to an instancing engine to help bridge this gap. When using instancing and assuming 10% of the bus bandwidth is available for non-animated instance data (0.10 * 34MB / 48 bytes per instance) = 74, 274 instances per frame at 60 Hz can be sent across a PCIe 16x bus.  If we give Autogen Trees 20%, then we get 150,000 trees. 3000 trees per cell is certainly within those limits. So that change alone gets us close to "within bounds".


Given there are sliders and config entries, users can adjust their settings to local conditions of the hardware and the style of flying you do. That is why this gap isn’t “tragic”; but it still is a rude surprise to most people that FSX tries to do so much and that is a good part of the root of the problem and why the FSX engine is so different from other engines.


And there are some other things we are doing, so the engine isn’t so overcommitted. But that is another post. J.


PS.


http://support.microsoft.com/kb/555739 lists the Autogen max per cell tweak. 


PPS


The slider stops correspond to the following percentages:


0, 10, 20, 45, 70, 100








[1] 3 floating point numbers for position, 3 for normal, and 2 for texture coordinates: ((3 + 3 + 2) * 4 bytes) = 32 bytes.


Comments (14)
  1. NickN says:

    Great post Phil and one of the reasons we rework AG in GEX the way we do. We look at the objects and add up the weight and then try and maintain a percentage which will play nice with the rest of a typical render.

    It’s really great to see you guys are looking at all this now. Very uplifting to know things are being considered based on REALITY instead of just FEATURES. I knew you were the man for this job!!!

    I look forward to your next installment on changes.

    Also, don’t forget to test it in combat. the theory is great, but it must always be tested on different systems to be sure the design is in fact on the right track

    🙂

  2. anthonyvos says:

    Hi Phil! I concur with Nick, great post and insight indeed. Especially reading the last part; going from batching to (semi) instancing, got me very excited!

    Thanks! 🙂

  3. froemmelm says:

    Hi Phil,

    thanks for all the work that you’ve put into FSX. Talking of autogen and its impact on performance, there is one thing I never quite understood. When rendering a forest, you render thousands of individual trees. Wouldn’t it be quite a lot easier to render a forest? I’m thinking of a rectangular or triangular object of let’s say 500 by 500 meters with a forest texture? This should save a considerable amount of performance.

    Best,

    Matthias

  4. ccaughie says:

    In response to Matthias, I’ve also thought that rendering a forest would make more sense; not only would it hugely reduce the polygon count it would probably look more realistic as well. In real life the only time you can see individual trees in a forest of reasonable density is when you’re actually in it.

    Colin

  5. Phil Taylor says:

    Imposter systems can be tricky too.

    And we’d have to be re-writing that for Trains to get the 10 foot experience so in the long run this is the right way to author the feature.

  6. steveyb says:

    Hi Phil,I was under the impression that from reading your post back at the end of last year that you stated that you had delivered and for-filled your promise on release of sp1 + 2 and that you were moving on to more pressing things, was I wrong? So what are we talking of, reading the above posts ..SP3??…I am suddenly excited!

  7. Ted Striker says:

    Glad to hear you are focusing on the core Phil. I’m a non computer engineer trying to grasp all of this. In this example you used an expected performance rate of 2 GB/sec for the PCIe 1.0. I’m assuming that this is the video bus bandwith. However in your discussion on video cards below you had a bandwith of 33.6 GB/sec for the 8800GT. I can’t imagine that the video bus is that much of a bottleneck? What am I missing?

    I was also curious why you used an FPS of 60 in your example. I always thought that running it this high, say versus 30, doubled your bandwith demand without any noticable performance gain. Am I missing soemthing here also?

    Looking forward to hearing about the other things you are working on.

    Ted

  8. Te_Vigo says:

    Knocking the AutoGen numbers to levels suggested in KB555739, certainly assists in being free of the DX10 landing light (bug ?) causing the screen to flash/ calc errors, etc.

    Froemmelm… rendering the forest, like il2?

  9. Phil Taylor says:

    SteveyB:

    this discussion is about progress made on TrainSim2, not about a SP3 for FSX, there is no plan for an additional service pack.

  10. Phil Taylor says:

    Ted:

    I used 60 just as a point of discussion, yes if you cut the FPS down to 30 that effectively doubles the amount you can push.

  11. vssg72 says:

    Phil,

    Thanks for this post.  I’m glad to see this sort of thing being done.  Knowing the practical limitations of what you are trying to engineer is a very valuable thing!  Alas, I’m no engine programmer or 3D guru at all, so can you tell me what batching versus instancing are, or at least tell me where I can learn?

    Keep up the good work and informative posts.

    Mark

  12. Phil Taylor says:

    VSS:

    batching vs instancing can be read up on in the DX SDK or a search with "Direct3D batching instancing".

  13. Te_Vigo says:

    A couple of other things I’ve found that might help, or might not, as the case may be…..

    Some mobo’s have a (in BIOS) memory (DDR3 at least) setting tRFC, which in some cases finds itself perhaps a bit tight. I reset my settings to default, to find the tRFC went from 48 to 60. What it’s worth, I don’t know, though it did smooth things out a touch… could some one else clarify perhaps?

    Another which has me scratching my noggin…

    I’m running Vista HP and FSX in DX10 Preview mode to find the landing lights, when activated, would cause the screen to flicker and flash when aircraft were viewed from certain angles (usually @ 12 o’clock high; 3 ; 6; or 9 o’clock level), yet I accepted the last DX9 (9.2xxxx) bi-monthly update to see what appeared to be no flickering or flashing of the screen whatsoever.

    How does that work?

  14. JimmiG says:

    "Crysis doesn’t try to render 250k trees."

    Nope, but it does render individual, animated leaves, animated vegetation, has dynamic realtime soft shadows on everything, full HDR rendering etc. 🙂 A tree in Crysis and a tree in FSX are completely different things.

    But I get the point, you can’t use a graphics engine intended for shooters in a flight sim, but the oposite holds true too.

    Also with the low draw distance of autogen, I very much doubt you see that many trees. Forests in FSX look fairly dense close up, but the trees quickly thin out farther from the viewer.

Comments are closed.

Skip to main content