FSX SP1:Performance work


Finally, we have shipped!


 


It’s been solid work since Nov from the Graphics and Terrain team on SP1 performance.


 


Here is a work-list on performance so you can understand what we did:


 


General Performance Work


1.         performed more work on the LOD system,


2.         optimized the UI rendering to reduce overhead when the UI is visible,


3.         front-end texture loader no longer loads loads full mips when it doesn’t need to


4.         removed redundant elevation queries when scenery complexity is low


5.         avoid rasterizing water into the DEM for textures that are all land


6.         fixed redundant vertex issue with key Autogen BGLs


7.         updated XToMdl tool to leverage same vertex issue, resulting in model vertex reduction on the order of 25-40% for 3rd party developers.


 


D3D Performance Work


8.         enabled skinning for more animated objects; which reduces Draw calls,


9.         batched Autogen objects to reduce Draw calls,


10.       optimized tree rendering to reduce SetTexture calls


11.       coalesced shader state to reduce uploads to the card


12.       fixed 3 FS8 AI aircraft in terms of Draw calls. This is the MD-80, Dash-8, and Cherokee. This is a >10x reduction. They are 25% of the worldwide air traffic DB so this can be significant.


 


Multi-core Performance Work


13.       moved DEM loading to threads,


14.       moved terrain texture synthesis ( the process itself is documented in Adams’ “Global Terrain Technology for Flight sim paper at http://fsinsider.com/Community/Developers-Corner/Global+Terrain+for+Flight+Simulator.htm, see the bit about the layers and texture synthesis ) to threads,


15.       moved Autogen batch rebuilds to threads


 


That’s 15 different work items. We were busy!


 


How it went together


 


Now, don’t think because we are calling it a “patch” that we are doing binary patches. We are not.


 


Yes, we use a delta-patching technology in the installer. For scenery bgls and the like. Even there some files have changed enough that the delta-patcher cannot handle them and we have to have a “full file” in the patch. The bgl for the Japanese traffic data is an example, we reversed the traffic vectors for the entire country, and almost every byte changed. We tried to put it through the delta-patcher, but it keeled over and gave up and errored out during setup.


 


We *are* rebuilding the binaries from scratch. Thats not trying to patch the old binaries, its replacing them with new files, many of which have quite a bit of new code. The multi-core work, for instance, went thru the terrain code stack from top to bottom. Thats one reason why SP1 took so long. The multi-core infrastructure is solid, will use up to 256 cores if available, and will continue to be used as we migrate systems to it as it makes sense. Terrain and autogen are it for now, we’ll be evaluating when to do more.


 


General Performance Work


 


The general performance work reduced the amount of work we try to do in various scenarios. I want to call out the redundant vertex issue, that’s a key thing we fixed in the Autogen blgs (autogen.bgl, vegetation.bgl, and roofs.bgl) as well as in the SDK tool to pass the savings on to all 3rd party developers when they reauthor with the SP1 SDK.


 


D3D Performance Work


 


D3D API usage work was aimed at reducing our Draw and SetTexture API calls. So what are Draw and SetTexture calls? These are the D3D9 API calls that the engine issues to push textures and draw triangles down to the card, the bulk of the work in rendering. We were issuing way too many Draw and SetTexture calls; SP1 is a 35-40% reduction in both. Taking those optimizations is aimed at enabling the app to scale better on GPUs. We took some optimizations on shader state to, which is a nice win. And the 3 FS8 AI aircraft where just horrendous in RTM so that’s another nice fix.


 


Multi-core Performance Work


 


Intel is using FSX as one of their prime examples at IDF, we had a lot of engineering time from one of their threading guys. Intel doesn’t do that lightly. We used the time to good benefit.


 


During loading, we run the DEM loader on threads. You’ll see good balanced usage across all cores; as well as about 1/3 faster load times on average.


 


During flight we spawn threads for Autogen batch rebuilds as well as the terrain texture synthesis. The terrain texture work tends to be a bit bursty; as an area gets generated the load reduces true. But as you fly forward, as you bank, and as the terrain is lighted ( once a minute ) threads are spawned. The terrain grid system is radial around the current viewpoint, and, depending on level of detail radius can be up to 4.5 tiles in either direction, something like 64 tiles. So there is plenty of work to go around. Autogen is more constant, with a 2km extent being batched.


 


Even given the bursty nature of the core usage when flying; when there is load, its pretty balanced across the cores. And we got rid of as much of the stutters as we could by going to a lock-free synchronization style. Its solid work that we are deservedly proud of.


 


As far as practical limits on number of usable cores; currently SetThreadAffinityMask only allows explicit scheduling of threads on 32 cores ( the mask is a dword ) on Win32. So thats our effective limit on number of cores. But as soon as there is a way to explicitly schedule them, we can handle 256 cores.



 


Conclusion


 


With all that said, the Draw and SetTexture API call reductions and Autogen size reductions are probably as important for FPS improvements; the multi-core work really shines for load balancing and reducing stutters and blurries. And both are critical for better scaling as CPUs and GPUs get better.


 


We think SP1 is going to deliver the goods for most users, and will reward users with better hw the most. We expect that, except in the very,very low end hw, all users should see a 20% gain. Some scenarios will see 40%, and some will see a bit more. Its really going to depend on a lot of variables. We hope this enables users to either fly at the same settings with greater FPS, or to bump the sliders up 1 or 2 ticks and still get the same FPS you had.


 


It’s going to take time to see if that holds true, but we had good results in our perf lab and with the beta testers.


 


Note 1:


 


The vastly improved batching of Autogen was one of the major performance items in SP1 and helps to reach our target reduction of 35-40% for Draw and SetTexure calls. However, it does have an implication that, when coupled with a feature we lost from FS9, you should be aware of.


 


FSX does not “alpha-fade” Autogen in the distance. This makes for a discernable “pop” of Autogen objects. SP1 batches objects in a 2 km boundary. This, when coupled with the lack of “alpha-fade”; does make the Autogen “pop” a little more obvious than in RTM. We think it’s a fair tradeoff, though, for the performance gain.


 


For DX10 we will look at bringing the “alpha-fade” back.


 


Note 2:


 


We changed our bucketing code in SP1, so if you use “Restore Defaults” from the UI, you may see different default settings. What did we do? Well, RTM only detected up to 512m of memory and used that as the “Ultra High” setting. With the 768m 8800 card out, there was no way to stratify that above the 512m 7950s. So we detect 640m of graphics memory and treat 640m and greater as “Ultra High”.


 


There is an issue on Vista, where on some cards it can report a “shared” memory value larger than the physical value and that confuses our bucketing code. If you don’t have a DX10 card and you are getting bucketed “Ultra High” for instance, change your settings down. We’ll take a look at this again in DX10 to adjust the Vista bucketing.

Comments (10)

  1. panos95 says:

    Hey now I can’t reactivate ‘cos I have already installed it at 2 PCs!!

    Why you did this???? I am furious!!

  2. Bikedude says:

    You mentioned that you do your own thread scheduling. Does this give you much of a boost compared to simply handing that task to the OS?

    One of your previous comments also seem to indicate that FSX can struggle with a 2GB address space. How about a 64-bit version? 😉 You also mentioned that DX has issues with the /3GB switch; But it should work nicely under 64-bit Windows, right? (which would grant FSX a 4GB address space in case the FSX executable has been tagged as large memory aware)

  3. Phil Taylor says:

    Bikedude:

    we found that unless we explicity used SetThreadAffinityMask that the OS would move thread and generate extra collisions.

    if we could get more address space, we would use it, we are just trying to do too much more than struggling. and all 32-bit processes have a 2g process address space. if there was D3D9 fix we would work with the /3G switch but it squeezes the kernel so much it can cause problems elsewhere. but without that fix on D3D9 we cannot go above 2G.

  4. nikez2k4 says:

    I just wanted to say THANK YOU!

    You mentioned in your post that you expected a 20-40% performance increase with the 40% probably on higher end PCs. I am more of a web dev guy so I don’t really understand technically what you have done to improve performance, but my god did it work?!

    I was previously running FSX with everything on medium-high and it was running at a nice (for me!) 15-25 FPS. After the SP1 update I have noticed a 50-100% (thats not a type-o!!!!) increase in performance at the same settings! I am now running FSX at 30-50 FPS and it is amazing. Everything just seems so much smoother now.

    I have also cranked everything up I can find in the settings panel and the game still runs at 5-10 FPS…..although it is playable I think I am willing to sacrifice a little quality in order to get a lovely 30 FPS.

    Thank you again for this update it has really made a huge improvement to the game.

    If you are interested I am running an Intel Core 2 Duo E6400 with 2GB RAM and a BFG GeForce 8800 GTX OC. It’s also a clean install of Windows Vista so I guess that will make a little difference – you know how much Windows can get bogged down!

    Thanks again for your steller work guys!

    James

  5. panos95 says:

    OK. Prob solved.

    You guys have done an EXCEllent JOB!

    Thanks a lot

  6. JCM_GDL says:

    Maybe this is the right place to post it:

    Hi Phil!

    I just installed SPI in my FSX installation and I am very glade with the increase of performance obtained. But I found something annoying in this update: In the RTM version, the mesh at medium and so far distance was rendered really nice (I am using a FsGenesis LOD 10, 38m mesh). Now, with the SP1, only the closer terrain (4 to 5 nautical miles from the point of view) is propelly render in all detail, but far than that distance the mesh becomes as the FSX default (maybe LOD 6 or so), with weird artifacts in the transition points. I tried to solve modifying the fsx.cfg file on the LOD_RADIUS setting to a higher value (9) but the gained performance with SP1 gone away and the scenery load time increase in three times.  Maybe this is one of the tradeoffs of the SP1 in order to achieve more performance, but the lost of detailed terrain render is IMHO a big cost for obtaining a better performance.  Im. using my own prepared photorealistic scenery created in FSX SDK

    What can I do to arrive a balanced setting with good performance and a nice mesh rendering at medium distance (up to 50 nautical miles, as before SP1)?

    Thanks in advance

  7. Phil Taylor says:

    JCM:

    does the default mesh act the same way?

    if not, thats a clue its something with FSGenesis.

    there is nothing that stands out in my mind that could cause this from the SP1 work, btw.

  8. JCM_GDL says:

    Thanks Phil.

    That’s a good observation. I’ve running some tests with a 76m third party scenery and the mesh resolution seems to appear a little far than in the case of  the FsGenesis. 38m mesh I use for Mexico area, and in the case of default scenery I test Phoenix area and it looks perfect and doesn’t seems to have this issue. Maybe SP1 has introduced something in its code that has some kind of compatibility with meshes developed with previous versions of SDK packages.  I’m going to look for FsGenesis if they have addressed this issue up to now.

    Thanks again.

  9. c152flyboy says:

    though this may seem obvious to some, i am confused one one thing.  i have seen other postings where someone had mentioned changing the affinity setting on FSX to give it a boost in performance.  didn’t SP1 already take care of that, if not how did it change the way multi cores work, and should both things be done?