The Last Word on Dual Core


Well, my last word, anyway. It seems there’s quite a bit of anxiety on the FS boards about whether or not FSX will “take advantage of” multi-core CPUs. I’ve tried to provide some explanation but it’s quickly drowned out by the waves of claims, opinions and speculation. So I thought I’d try one last time to explain the real deal. After this, I’m done.

 

First off, let’s just all own up to the fact that multi-core CPUs technology is no silver-bullet magic voodoo that automatically gives you twice (or four times) the performance of a single-core CPU on all applications. Despite the hopes on many 20th century authors and futurists we do not yet have “thinking” computers that can look at an application, figure out what the programmer meant for it to do, and automatically optimize it to make it work faster. The programmers gotta do the work. And that work ain’t easy for a game where you need to spit convincing images to the screen dozens of times each second.

 

So what, really, needs to be done? Simply put the programmer needs to divide up computational tasks in such a way that the operating system can “farm out” the work to multiple CPUs. In computer parlance these are known as “threads”. The notion of so-called “multithreaded” applications is nothing new since multiple CPU PCs have been around for quite some time. It’s only recently, with the advent of multi-core CPUs bringing multi-CPU computing to the masses, that the topic has garnered much interest from gamers.

 

What makes programming a multithreaded application, especially a game, difficult is the interdependecies between tasks. For example, take AI aircraft in FS. I can’t count the number of times I’ve read “why can’t FS just use the second CPU for AI traffic?” Well, what’s involved in rendering AI? For one thing the AI occasionally need to know “am I on the ground?” For that, some process has to be able to figure out where the ground is–i.e., the terrain system. There’s one interdependency right there. Another interdependcy is the AI’s use of ATC. ATC needs to track the AI planes as well as the user’s aircraft. And the list goes on and on.

 

The net result is that, no matter how many threads a program creates in an attempt to “take advantage” of multiple CPUs, at some point the work on one thread is going to have to stop and wait for something else, likely on another thread, to happen. This is especially true for a game where the application needs to update the screen. Server-applications that can treat requests independently from one another are less affected. Suppose, for example, that AI rendering was delegated to a thread that was dependent on the terrain system on yet another thead to provide it with infomation. What happens when it comes time to render the AI’s position? Do you wait indefintely for the terrain system? If you wait too long you’ll see a stutter. Do you simply not render the AI but rendering everything else? Not unless you want the AI to appear and disappear and jump around the screen.

 

Hopefully you can see that a multithreaded game like FSX consists of a numerous start, wait, and complete sequences. The big problem here is that when you get too many of these then nothing gets done because everything is waiting on everything else. So where can you use multiple threads? You use it where the interdepencies are loose and indeterminate wait times aren’t readily noticable. In FSX we use multiple thread for texture decompression and certain types of file I/O. Consider terrain textures that must be loaded and decompressed as you fly along. Normally new textures are needed for the area at the edge of the visual scene. Using low-resolution versions for the initial display and then loading higher resolutions in the background works because texture swapping in the distance is not very noticable. In other words, it doesn’t a matter if a texture is available immediately or several frames after it’s requested because you likely won’t notice the delay.

 

The good news is that these threads can be “farmed out” by the OS scheduler to multiple CPUs or cores. The bad news is that requests are made with varying frequency so the overall CPU utilization will also vary. In other words, those of you running the FSX demo and looking for 100% utilization on all your cores can just stop–you’re not going to see it. You’ll see a lot of utilization when you first load a flight (and we force requests to complete) and less as you fly along. As we continue to evolve the code base we’ll continue to look for areas where thread offloading makes sense but changes in the area can have unexpected results so it will take time to decide what works best. And, oh, when you find a game that does use all that horsepower all the time, please let me know.


Comments (32)

  1. Jeff Greth says:

    Well said, Mike.

    I’m a programmer as well so I can appreciate the pains you go through to explain these matters to the end users that just assume everything is easy to program because… why not?  I mean if a CPU is just sitting there, give it some work!  😛

    Seriously though, I hope this calms people down a bit.  On the various forums this has been a huge topic of debate ( though not much of a debate, more like a whine-fest ).  

    Good job bringing it all into perspective.

    Can’t wait til October.

  2. Topboy says:

    Nice work TDragger

    Thanks for the information

    Topboy

  3. Larry N. says:

    Nicely explained, Mike.

    Larry N.

  4. John Penn says:

    very well put, and an interesting read , boy I,m glad I never took up programming, my hair is grey enough as it is:).

    John

  5. Randall says:

    I don’t have a multi-core CPU.  Won’t be getting one anytime soon.  So I don’t care.  Still, it was a great read.  Thanks.

  6. Kittie says:

    Wow, okay that explains it clearly.  As I said before, the biggest problem out there is all ‘the experts’ that know everything.  

    Just one question about dual core if I may.  Will other programs take advantage of the second CPU.  For example, Active Sky and Squawk box while FSX is running or is that a Vista thing?

    Erica

  7. EdrickV says:

    Using other CPUs for the 3rd party stuff, IMHO, is the way to go. And I believe you can customize that to your liking even using Hyperthreading under XP Pro. (Not sure about Home, only system I have XP Home on doesn’t have hyperthreading.)  Just bring up Task Manager, (Ctrl+Alt+Delete) right click an individual process on the Processes tab, and choose "Set Affinity."

  8. Curt says:

    I appreciate the fact that you are making strides in the Flight Simulator world for multi-core, I still am a little hesitant to accept some of the reasonings.  But, to help rationalize them for everyone, is very difficult to take a single-thread model and spread it out to multiple CPU’s.  Unfortunately when you do this conversion, you make many Critical sections (A Critical section is just that, only one CPU can run through that section at a time).  This often times much larger than they should be to make sure you don’t do something wrong accidentally, thus all the waiting for one another.

    Good news here, we havn’t seen any dead-locks.  For the non-programmers, that’s when you have two threads who are waiting for each other.  Kind of like a 4 way stop, everyone has to stop, but, the other person is waiting for that guy on the other side to go, but, he’s waiting for you.  And even worse, computers are so smart they just wait there for each other until the end of time.

    I’ll be interested to see how things progress in this.  I have to lock my framerates lower because FlightSim isn’t doing much of the loading and unloading of terrain and textures (which, you would normally need your rendering thread to push those into the GPU, but that can be done later normally).  

    Also, the talk about doing "Set Affinity" probably isn’t worth it.  Affinity is just a fancy term for "You are allowed to work on the following CPU’s"  For gaming, the XP Kernel will make sure you are not messing things up too much.  If Squakbox is slowing something down, just set FlightSim’s priority higher than Squakbox, and it will get the CPU it needs.  But, both can share the CPU’s fairly effectively.  Affinity is really more for larger systems when you have 16 CPU’s and only want SQLServer to use 14 of them because you need 2 of them for a middle-tier or other critical applications which need dedicated CPU time.

  9. patk says:

    How about offloading audio processing or ATC to the 2nd core?

  10. whatever123 says:

    tdagger for your question show me one game ? , BF2 , atleast from what I see usually my cpu bars for the both cores are active all the time and within 1-2% of each other, and system memory is at a 80% of 2gb. I am seeing this info on my logitch G15 LCD.

    I think you guys did an excellent job on the product atleast by looking at the demo. Thanks again for all your hardwork, late hours you spend acting as a customer service rep for your company.

  11. Mike Dimmick says:

    Kittie: yes. Windows NT has always been multiprocessor-capable, and Windows NT Workstation, Windows 2000 Professional and Windows XP Professional have always supported 2 processor sockets. Windows XP Home only supports one processor socket – if you have more than this, any processors plugged into additional sockets will simply not be used.

    Dual-core processors only occupy one socket, so XP Home should still make use of all logical processors on a one-socket HT or dual-core processor.

    Edrick: Set Affinity makes the problem worse, not better. The OS generally works better if it is free to choose which processor any given process or thread runs on. Don’t worry about it moving a thread to a different processor – if possible, it will keep a thread on the same processor it ran on last time. If a core becomes free (because a thread has to wait for something to happen), Windows will find a thread to run on it. If a thread has an affinity mask that doesn’t include that processor, and that thread is runnable (not waiting for anything), the thread will have to keep waiting and in the worst case the core will go idle when there was work it could have been doing.

  12. Cristian Gonzalez says:

    Although I agree with your points I would add there are many other ways to use additional CPUs, especially if you are starting from an existing, linear flow.

    Imagine that at some point during every render cycle of our favorite "game" you need to perform some operation on a relatively large number of elements and your code at that point says:

    for (n = 0 to aBigNumber)

     doSomeNiftyStuff(element[n]);

    Assuming that those operations don’t depend on each other, you could instead use 2 processors and do in parallel:

    for (n = 0 to halfABigNumber)    

     doSomeNiftyStuff(element[n]);

    and

    for (n= halfABigNumber+1 to aBigNumber)

     doSomeNiftyStuff(element[n]);

    The program flow does not change but you roughly halved the time it took to process all your elements.

    You’ll almost certainly not half the execution time of your render loop by doing this through your entire cycle but you might get a, say, nice 5% shorter time. Not bad for a "cheap" multitasking approach that would require (relatively) little work 🙂

  13. David Jones, London England says:

    Hi Mike, I agree it would be difficult to program Flight Simulator for multiple threading, however I do believe the concept is a valid one. If you were to start from the bottom up, it would be possible to create gaming subsystems which would be completely independant of each other. Each of subsystem at its own time would update the gaming world.

    The best example of what I’m trying to say would be in how multiplayer games work, each game station (i.e. cpu) would represent a section of the FS universe and would be be responsible for assembling there own objects postion much like a fellow online player is represented in a mmo. The display engine would then examine the gaming world and would render the game as a whole as it sees it at that point in time. There doesn’t need to be any dependancies on this engine as it would just use the data available.

    The technique would be difficult to for FS to achieve in its current incarnation, but the proof mmo gaming worlds work would be enough for me to realise going forward would mean significant benefits.

    Taking a sidestep for a moment, I can see similar technology being utilsed (if not eventually with FSX) which a company will host an online environment which can be accessed and contributed too much line open source, so you would download a thin client and then connect to the FS world which would feature the latest customisations streamed to your PC everytime you connect. OK somebody is going to say this happens already, but in reality the software and hardware is still not quite ready to really make the most of this concept, but were getting close.

    Sorry to ramble on, gosh I should start my own blog…

  14. Max says:

    Just to kwno: what about FS9? Is it taking any advantages (even marginal) for running on a dual core CPU?

  15. Grant McLaughlin says:

    Hi

    Great discussion on Multi threading!

    To follow-up on the the comment from David Jones on building multi-threading apps from scratch, I would like to share some info on a new game avialable from LEGO. Yes! Lego.

    Lego has just released their Mindstorm robot ‘game’. this consists of stored program CPU that controls the actions of the Robot and about 600 lego buliding blocks specially designed for the Robot actions.  The interesting part of this ‘game’ is the programming language that is used.  It is called LabView and has been around for awhile as a laboratory data processor and analyser.  

    The programming language is completely graphical using programming objects similar to the shapes in drawing programs.  

    the cool part of the programming language is that it is specifically designed for multi-threading.  Just draw multiple, independent  threads including these programming objects like you would draw an org chart or a flow diagram.  the independent threads can be asked to comminicate with each other just by drawing ‘wires’ between any two of the objects within the threads.  these ‘wires’ can connect from/to various classes of receptors in the objects.  these receptors can be for data or could be boolean operators, etc.

    I am just starting to use these features and am excited about the possibilities.  as you can see from my poor explanation of the language, I am not a graphical programmer.  However, LEGO promises that this game is designed for persons age 10 or older, so I think I may be able to get it working  🙂

    Grant

  16. André says:

    Check & uderstand OpenMP for C++, makes any code fly multi-core in no time.

  17. Pedro Almeida says:

    "And, oh, when you find a game that does use all that horsepower all the time, please let me know."

    Falcon 4.0 🙂

    Regards.

  18. Manny says:

    Since its established multi core does not benefit FSX much, most of the advancement in CPU technology seem to be in the realm of Multi cores and hence we are left behind with what the single core speed can do.

    And in that regard, we have reached a plataeu. It may not be the fault of the developers…nevertheless a dead end.. .sort of.

    🙂

  19. gerry howard says:

    Re: Christian Gonzales’s Comment

    I’ve seen a Microsoft paper on parallel processing that shown if "aBigNumber" was small, the effective processing speed dropped to 20% because of the overhead involved in organising the threads. It reached a maximum of 180% when the number was very large, the overhead still accounted for the differnce between the theoretical 200% and what was actually achieved.

  20. Geoff says:

    Hardware is generally leaps ahead of software.

    It is also leaps ahead of Software development tools, and the experience of programmers to program multi-core systems.

    As the software development tools improve for writing multi-core applications, we will see more.

    With multicore applications comes additional complexity, that introduce additiona bugs.  Be careful what you wish for.

    Geoff

  21. Sebastien De Coster says:

    Just a note for BF2 using 100% of a dual core… BF2 is made up of 2 processes : the game and another that checks the "license". This second process uses all up to 100%. When you have 2 core, each process takes one.

    This should make no difference when you play BF2…

    As for FSX… It looks nice but I still have to tune it. 🙂 Hope you can maximize the use of multicores, and thanks for the development.

  22. Tom Skwara says:

    The LEGO system is based on LabVIEW which has a dataflow model as opposed to a imperative model used in C, BASIC, etc.

    Dataflow naturally encourages muli-threading with one processor or amonst many.

  23. Uli Romahn says:

    I think, I have to respond to this post, although it is kind of late…

    I am not surprised that FSX is NOT using the added benefit of a multi-cpu/multi-core cpu machine because FSX is most likely inheriting its main code-base from FS2004 which in turn was inheriting its main code-base from FS2000 which in turn was inheriting … I guess you get the picture.

    One of the major issue here is that much of this code-base which most likely provides the core engine of the simulation was originally written for a single-cpu machine (the good ol’ PC) on an operating system that didn’t even know how to spell ‘multi-threading’ (MS DOS).

    So, I have to agree with Mike that converting a program that used to be completely single-threaded into a multi-threaded application that is effectively utilizing parallel computing capabilities as offered by multi-cpu (and multi-core) machines is an incredibly complex task. And, since FSX is already a pretty complex program – who would disagree on that – I don’t think the development team wanted to make things even more complex. I have to take my hat off to the incredible work the team has accomplished with FSX.

    However, I have to strongly disagree with some of Mike’s explanations why simulations – and a lot of the modern games are simulations – should not benefit from multi-core cpus. I know of a few simulation programs  – one of which happens to be a commercial flight simulator – that have been designed and built from ground up utilizing parallel computing capabilities. In fact, when develoment of the mentioned flight-simulator program was started – a long time ago – there were no such things as multi-core cpu’s and dual-cpu machines were the non-plus-ultra. But, these dual-cpu machines had a computing power less than a current "middle-class" PC such as a "normal" 3GHz Pentium 4.

    So in order to buld such an incredible simulator, the program was divided up into different logical units distributed across multiple individual computers and communication (and necessary synchronization) was accomplished using a high-speed data link between the computers.

    I don’t want to go deeper here since I would take the risk revealing too much (classified) information, but all I wanted to point out is, that if FSX would have been designed and build in a different way similar to the simulator mentioned above, there would be absolutely no problem benefiting from dual or even quad-core cpus.

    But, apparently FSX has NOT been designed and built that way, so we have to live with the fact that there is no way it will effectively scale with the advancement in current hardware technology – unfortunately.

    BTW, on another note: having a cpu (or core) reporting 100% utilization in a multi-threaded environment is pretty much meaningless and says not much about the performance of a particular application. But, that’s something for another posting on another message board – maybe…

  24. Jinnah says:

    Same as someone mentioned above, I see the dual bars very much active for Americas Army and only one bar active of cpu usage for fsx.

    I am using a amd 4800 x2

  25. Lawrence Ballard says:

    Why did intel not make it so that Dual Core Processors can also work together processing one application where need be.

    IE. If you have 2 x 2.2GHZ Processors and you are using an applicaiton like FSX that does not support multi core processing, then why can the 2 processors work together to create a 4.4GHZ processor