C++ AMP apps on Optimus notebooks


As more powerful discrete GPUs make their way into laptops and notebooks, we are seeing new technologies which help balance the battery life and performance of these devices. In this blog post I would like to address problems that some of you may have had running C++ AMP apps on notebooks with one such technology from NVIDIA called Optimus.

[Update: As of January 2013, with NVIDIA driver version 310.90, C++ AMP is fully and correctly supported on Optimus systems by default. The discussion below is left for historical reference, but the issues mentioned are mostly not valid any more.]

What is Optimus?

Optimus is a new technology that automatically determines the best way to deliver graphics performance and maintain long battery life for your notebook. Typically, Optimus enabled notebooks come with an integrated graphics processor which is good for battery life and a discrete graphics processor which delivers great graphics performance. The Optimus routing layer seamlessly switches from the integrated to the discrete graphics processor when it detects that an application can benefit from the additional performance. Some apps may be recognized by Optimus and launch with the discrete graphics processor. For all other apps the discrete graphics processor may not even be active if it is not needed.

C++ AMP apps on Optimus-enabled hardware

How does this affect C++ AMP apps?  Let us understand this by looking at one such notebook, DELL XPS 502X notebook running Window 7. This notebook has two display adaptors shown below. The NVIDIA GeForce GT 540M is the C++ AMP capable accelerator.

optimus1

Lets us start by running the utility to list all the accelerators that support C++ AMP. When I tried this utility on the notebook, it reports no C++ AMP capable devices are found!

optimus2

This is because Optimus launches the utility above with the Integrated Graphics processor by default. The Integrated Graphics processor is not DirectX 11 capable and hence does not qualify as a C++ AMP accelerator. The NVIDIA GeForce 540M is inactive at this time and not detected as an available accelerator either.

You might see other symptoms in your application. For example, if you use the default accelerator, the application falls back to the software reference accelerator since no better accelerator can be detected. If you specifically choose a hardware accelerator; the application will fail since no such accelerator can be detected.

To fix this, you can guide the Optimus layer using one of the two methods described in the next two sections below.

Pick the preferred graphics processor at launch time

Optimus notebooks come with a new option to pick the preferred graphics processor when launching an application. Right click on the executable and you’ll see a screenshot similar to this:

optimus3

This means you can manually choose the DirectX11 capable graphics processor to launch C++ AMP apps. When I launch the utility to list all the accelerators that support C++ AMP with the “High-Performance NVIDIA processor” menu option, it now reports the truth that the notebook does have a C++ AMP capable device.

optimus4

Change 3d setting for C++ AMP apps using the NVIDIA control panel

The second method to run your app against the discrete GPU instead of the integrated one does not require you to pick a graphics processor at every launch. Instead you can set this up globally by changing the 3d settings in the NVIDIA Control Panel, as shown in the following screenshot:

optimus5

The NVIDIA Control Panel allows you to choose the preferred graphics processor for all apps (global settings) or for specific apps (program specific settings). To setup Optimus to pick the discrete graphics processor for all apps, change the global settings from Auto-Select to High Performance NVIDIA processor.

Note that changing the global settings to use the discrete graphics processor all the time may lower your battery life. Instead, if you have a fixed set of C++ AMP apps, you can change Program Settings for only specific apps and this can help you get the benefits of the Optimus technology and the best compute performance for C++ AMP apps.

optimus6

Turn off Optimus in the BIOS

On some Optimus systems, for example the Lenovo W520 with a Quadro 2000M, you will find a setting in the BIOS to turn off Optimus. Remember to first select your discrete card as the default card in the BIOS, and then turn off the Optimus feature in the BIOS. Again, this may decrease your battery life, but it is another workaround.

Driver update from NVIDIA

There is no programmatic way on Optimus systems to affect what accelerator your EXE will run against, and the approaches outlined earlier (right click and execute, OR add the EXE to the whitelist, OR turn off Optimus in the BIOS) are not ideal. So, we have reported this as a bug to NVIDIA. NVIDIA may release a driver update to address this issue, so that all C++ AMP apps will automatically execute under the powerful GPU with no user intervention required. So if you are on such a system please check for a driver update from NVIDIA. The tests above have been performed on Windows 7 with driver version: 301.42.

Also note that we have tried an interim updated driver on Windows 8 (driver 302.80) which shows better results, but still not ideal results (the single NVIDIA card was listed more than once in the list of available accelerators), so we hope a further updated driver will appear for Windows 8, and for Windows 7.

You should now have a better understanding of how to work with Optimus for your C++ AMP apps in the interim while we wait for a driver update from NVIDIA, and if you have any questions please post them below.

[Update: As mentioned above, the issues are addressed in the latest drivers (verified on version 310.90).]

Comments (9)

  1. I am trying to use Intel HD 4000 GPU and NVIDIA K1000M GPU for my sample code. When I use both these accelerators together; the data which gets executed on Intel GPU generates incorrect result. I think this is because of OPTIMUS only. Do we have any ETA from NVIDIA when are they going to fix this issue for Windows 7? I have driver 306.97 but still problems persists.

  2. Pooja Nagpal says:

    Hi Chirag, to help us better understand the cause behind incorrect result generated by Intel GPU, can you provide the following details:

    1. Can you tell us more about how you use both accelerators together? Are you chunking the work manually across the two devices?

    2. Can you try if using only one GPU (either Intel HD 4000 or NVIDIA K1000m) will produce correct results. As listed in this blog, you can force to use only one GPU by changing the 3D profile settings.

    3. Last but not the least, what is the version of Intel driver on your system?

    If your sample works correctly on each accelerator individually but fails when using together, it may be related to the Optimus layer.  As always we are looking for feedback and please do respond with more information, so we can figure out why you are getting incorrect results.

  3. 1. ok, I tested in these scenarios:

    i) verify using only Intel HD 4000 GPU ( incorrect results)

    ii) verify using only NVIDIA K1000M GPU ( correct results)

    iii) verify using both GPUs at equal load (partially incorrect results – where Intel GPU is involved)

    iv) verify using both GPUs with load balancing (partially incorrect results – where Intel GPU is involved)

    2. As mentioned above. I also forced application to run with Intel GPU by selecting it from right click context menu option. Interestingly, even if I register my application to run with Intel GPU in 3D profile settings; on context menu it still shows that default is NVIDIA, however it runs with Intel GPU. It is probably because my application is ISV/partner certified with NVIDIA drivers.

    3. Intel Driver version : 9.17.10.2867 , NVIDIA driver version : 9.18.13.697

  4. another observation that I must mention here. When I run my program using NVIDIA card as default card no mater if I use single GPU or multiple GPU both results are correct and same. Only when I run my program using Intel card as default card, results of program using muliple accelerators are not correct. Looks like Intel GPU is completely bypassed or wrapped over by NVIDIA card as I also see two NVIDIA card entries when I list them using verfiyampdevices.exe program.

  5. Pooja Nagpal says:

    Thanks for the extensive testing, Chirag. This is very helpful.

    From your observations, it sounds like this may be a driver bug. Is there any chance you can provide us with a small repro so we can try it in our lab?

  6. can you email me? I can send you repro.

  7. Nenad says:

    I also have issue with AMP on HD 4000, on system with Optimus and Nvidia K5000M.

    1) If I use external monitor instead of notebook one, in accelerator list I get first K5000M , then HD4000 (but with same memory reported as K5000M!), then emulator and last is CPU.

    1a) when I select to use first (K5000M) accelerator, all works correctly

    1b) when I select to use second (HD4000) accelerator, it runs without error, but return wrong result (seems like copy to /from GPU works, but parallel_for_each does not execute)

  8. Nenad says:

    Related to previous HD4000 issue:

    2) If I use notebook display, I get K5000M twice on accelerator list, without HD4000

    2a) if I use first K5000M accelerator on list, it works correctly

    2b) if I use second K5000M accelerator from list, it also works ok, but use DIFFERENT TIME to execute

    I tried above with latest Nvidia 310.90 driver, and latest HD4000 driver

  9. Hi Nenad,

    Thanks for reporting this issue, unfortunately I couldn’t reproduce either of your atypical results:

    1. The NVidia card shows twice

    2. The compute results on integrated card (HD4000) is wrong.

    I tried to repro using Windows 8 system on a laptop with Nvidia K1000M and Intel HD4000. The latest drivers I had for this system were: NVidia (9.18.13.1100) and Intel (9.17.10.2843). I used the simple matrix multiplication sample like the one presented in this blog Matrix Multiplication with C++ AMP.

    Note that the order of devices when enumerating by VerifyAmpDevices or Concurrency::accelerator::get_all() doesn’t have any indication of what device C++ AMP runtime will use as default device. To read more about default accelerator selection refer to Default accelerator in C++ AMP blog.

    Do you mind providing us with more information about your environment to repro the two issues observed?

    1. What Operating system are you using? If not Windows 8 would you consider running your application on one?

    2. What is the complete device driver versions for each graphics processor? From device manager (devmgmt.msc)

    3. Does integrated device give incorrect results with any of a published samples? Like matrix multiplication for example, does it give correct results?

    4. Does the wrong results here mean zero values? How do you verify the results correctness?

    5. In code, do you catch Concurrency::runtime_exception for your C++ AMP kernel?

    6. How did you select a specific graphics processor for your application run?