Point sprites in XNA Game Studio 4.0


As of version 4.0, XNA Game Studio no longer supports point sprites.

 

Why?

One big reason, plus two small ones.

The big reason is that DirectX 10 and 11 do not support point sprites, so if we kept them in our API, we would be unable to someday move that API to future DirectX versions. This would violate our "break it good" goal of taking all the pain now to avoid future compatibility breaks.

It is theoretically possible to emulate point sprites using geometry shaders, but after some investigation we concluded this would be:

  • Complex. Not always a bad thing, but complexity can be an early warning of designs that turn out to be buggy or slow, so it makes me nervous if I see too much of it!

  • Not fast enough to be useful on several popular GPUs.

A smaller reason is that point sprites behave differently on Windows and Xbox, and are not even consistent from one Windows GPU to another (the max size varies, as does the result of TEXCOORD interpolators).

The final reason is that point sprites are slower than triangles on some common graphics chipsets. Hardware manufacturers are sure to keep point sprites around for DirectX 9 compatibility, but they are unlikely to spend much time optimizing what is now a legacy feature, so this performance delta will only grow over time.

 

What can I do instead?

If you are worried about the idea of life without point sprites, the first thing you should know is that this worried me at first, too. But I’m not worried any more. By the end of this article, hopefully you won’t be either.

When we looked at why people use point sprites, we found two main purposes:

  • As a simple way to draw 2D dots, such as in our Primitives sample. These scenarios can easily (and with great performance) be changed to use SpriteBatch with a 1×1 white texture, so point sprites aren’t necessary for them.

  • As a high performance way to draw particles, such as in our Particle 3D sample. This scenario was more concerning. Anything you can do with point sprites, you can also do with triangles, so our worry was about performance rather than functionality. It would be a bad thing if the 4.0 particle sample could not run as fast as previous versions!

If you’ve been reading my blog for long, you probably already heard me mention how the first step in performance work is to measure. Using GS 3.1, I changed the particle sample to draw triangles instead of point sprites. In fact I made three versions:

  • The original point sprite implementation. This requires one vertex per particle. There are two versions of the pixel shader: a cheap one for particles that do not rotate, plus a more expensive shader for rotating particles.

  • Using triangle lists, with one triangle and three vertices per particle. To fit all of a square texture onto a single triangle, the triangle must be expanded somewhat larger than the final particle size. Because it can rotate in the vertex shader, this implementation is able to use the cheaper pixel shader for rotating as well as non rotating particles.

  • Using indexed triangle lists, with two triangles and four vertices per particle. Although it requires one more vertex than the previous implementation, this can fit the entire texture without needing to expand the triangle, so fewer pixels need to be shaded.

All three versions used insignificantly tiny amounts of CPU time. The difference was in GPU performance:

  • Triangle lists took 129% as long as indexed triangles (turns out that having to shade fewer pixels more than makes up the cost of that extra vertex).

  • For particles that do not rotate, indexed triangles took 180% as long as point sprites (ouch! those extra three vertices per particle are really costing us. But it isn’t 4x slower, like we might naively expect). 

  • For particles that rotate, indexed triangles took 70% as long as point sprites (whoah! This got FASTER. Sure, we have to shade an extra three vertices per particle, but the cheaper pixel shader more than makes up for that extra vertex work).

A 30% gain from not using point sprites is pretty sweet, but an 80% penalty is painful indeed. I see more people using rotating particles than otherwise, but still… ouch!

One of the best things about my job is the chance to work with amazingly smart people. When I mailed around these performance figures, I was lucky enough that one of the graphics gurus who created the Xbox GPU driver happened to see them, and was sufficiently intrigued to spend some time looking at PIX captures of my test app. Jason discovered that my indexed triangle implementation was bottlenecked by rasterizer performance, specifically by an obscure hardware penalty which occurs if you use both center and centroid mode interpolators with a shader that is fast enough not to be bottlenecked by anything else (an extremely rare situation).

Once identified, it was trivial to remove this bottleneck, which left my indexed triangles bottlenecked by texture fetches, just like the point sprite original. Revised timing data:

  • For particles that do not rotate, indexed triangles take 104% as long as point sprites.

  • For particles that rotate, indexed triangles take just 40% as long as point sprites.

Armed with these figures, removing point sprites didn’t seem so painful any more.

I can’t promise exactly when, but we are working on an updated version of the Particle 3D sample which shows how to use indexed triangle lists, and includes Jason’s magic optimization. We will be sure to get this out by the time 4.0 ships, if not before.


Comments (25)

  1. Pete says:

    Are the final figures for the WP7, XBox360 or the PC platform? If the latter, for what specs?

  2. ShawnHargreaves says:

    These figures are from Xbox 360.

    PC performance obviously varies wildly from one chipset to another, but if you average across many different GPUs, the median result is similar to what we see on Xbox.

  3. Sharky says:

    40% YEEEES!

    Music to my ears Shawn. You guys ROCK!

    I can’t wait to get my hands on the new Particle 3D sample. I’m going to need it for game #2.

  4. David Black says:

    Hmm, What about cache effects?

    4x the number of vertices could be quite a bit of memory… How does the performance compare when other stuff is running in the background thrashing the CPU cache(even on the same core)?

    (I guess smaller vertex formats would help either one… Plus balancing the cache size against the batch overhead)

  5. Adam Miles says:

    "Once identified, it was trivial to remove this bottleneck."

    Was this the use of ‘nointerpolation’ on the Color  and Rotation vertex attributes? Since all vertices have the same Color and Rotation there’s no need to interpolate these over the polygon(s)?

  6. Forgive my noobness but I use point sprites to render a space dust field. When do you say Particles that rotate versus Particles that do not rotate. What exactly does that mean?

    Particles that rotate are the ones that are constantly rotating to face the camera right?

    and particles that do not rotate are the ones that have their orientation fixed?

    Or is it the other way around?

  7. Pete says:

    Is this shader part of XNA 4.0 for WP7? If not, are you guys going to include it?

  8. Is PIX for Xbox 360 something we can download now with GS 4?

  9. I’m disappointed to hear that there is no geometry shader solution. I am using a much more complex vertex shader than the Particle sample (but a very simple pixel shader), and running it three or four times is a real problem. I will have to try and come up with a two-pass solution, rendering particle state to a texture, which sounds horribly complex.

    Eduardo; Shawn means particle sprites that are rotated around their centrepoint, in 2D (screen) space. Both types always face the camera.

  10. ShawnHargreaves says:

    > 4x the number of vertices could be quite a bit of memory… How does the performance compare when other stuff is running in the background thrashing the CPU cache(even on the same core)?

    The CPU usage (and CPU memory bandwidth) for this style of particle system is vanishingly small either way. Even when drawing enough particles to entirely saturate the GPU, the CPU load is insignificant.

  11. ShawnHargreaves says:

    > Was this the use of ‘nointerpolation’ on the Color and Rotation vertex attributes?

    The optimization is to use either all centroid or all center mode interpolators, rather than one center and one centroid.

  12. ShawnHargreaves says:

    > Is this shader part of XNA 4.0 for WP7? If not, are you guys going to include it?

    No, and no. This kind of GPU intensive particle system isn’t really a good fit for mobile GPU hardware, so it wouldn’t perform well no matter how it was implemented.

    The best way to do particles on the phone is to animate on the CPU rather than GPU. In 4.0, the resulting data can be efficiently drawn using SpriteBatch even for 3D scenes (I’ll write more about that later, but it’s a way down the priority list 🙂

  13. ShawnHargreaves says:

    > Is PIX for Xbox 360 something we can download now with GS 4?

    PIX does not work on retail Xbox 360 consoles. It requires a devkit.

    If you are a registered Xbox 360 developer who has devkit access, you can use PIX with any version of Game Studio.

  14. David Black says:

    >> 4x the number of vertices could be quite a bit of memory… How does the performance compare when other stuff is running in the background thrashing the CPU cache(even on the same core)?

    >The CPU usage (and CPU memory bandwidth) for this style of particle system is vanishingly small either way. Even when drawing enough particles to entirely saturate the GPU, the CPU load is insignificant.

    <

    I wasnt concerned about the CPU usage, rather the memory bandwidth from L2 => Main Memory.

    Bandwidth is a scarce resource on 360, I would think if it [lack of point sprites] is going to be a problem, it would show up when there is lots of other stuff going on, not just a synthetic benchmark. (ie it would be interesting to know your results with the other cores walking through memory, so they miss the L2 constantly)

  15. nics sample shawn, i use the billboard two triangle or sometimes the 4 triangle like and x to give a 3d fell and look, allways for my paraticle system

    any chance of seeing the hidef profile in a beta before summer time

  16. David Black says:

    Should have read more carefully.. but it is very surprising that bandwidth use is low, I would have thought it would be the bottleneck (but then this is managed and has other costs.)

    Conventional wisdom is that nothing else in the system can keep up with the GPU on 360…

  17. ShawnHargreaves says:

    > Conventional wisdom is that nothing else in the system can keep up with the GPU on 360…

    If the CPU was involved in updating particles and feeding them to the GPU every frame, that would most likely be true. This is why our particle system is designed to do all the work on the GPU: avoiding the CPU entirely lets it draw way more stuff with better performance.

  18. David Black says:

    🙂 Its good when you can keep it entirly on the GPU. I can just imagine the test program, lots of near screen size cats… 🙂

  19. Hey shawn i was just wondering if you know when you will get time to do another one of these (this post was great) are we talking a few days or a over a week? are there any other people posting things like this that you know of? i checked all the links from your site (most others only update their blogs weekly or monthly) 🙁

    Anyway great post.

  20. ShawnHargreaves says:

    > i was just wondering if you know when you will get time to do another one of these

    As soon as I can get them finished, really! I’m aiming for at least a couple of a week until I run out of topics (currently have a list of 20+ things to write about 🙂 but it’ll probably end up being more some weeks, less others, depending what else is going on!

    > are there any other people posting things like this that you know of?

    Michael Klucher has an active blog and writes about many Game Studio 4.0 and Windows Phone topics, too. And Ashu keeps threatening to blog about the new audio features – he hasn’t actually written anything yet though 🙂 I’ll be sure to link it here if/when he gets that going.

  21. There are some who call me .... Tim? says:

    "I can't promise exactly when, but we are working on an updated version of the Particle 3D sample which shows how to use indexed triangle lists, and includes Jason's magic optimization. We will be sure to get this out by the time 4.0 ships, if not before."

    Shawn, any update on when/if this might happen?  Thanks!

  22. deadlydog says:

    I see that the XNA 4.0 version of the Particle3D sample is out, but I can't pin-point the Jason's magic optimization fix that you mention.  Can you point it out for us?  I assume it's a change in the shader code, correct?

  23. ShawnHargreaves says:

    > I can't pin-point the Jason's magic optimization fix that you mention.

    It is changing the two pixel shader input interpolators to be of the same type (both COLOR or both TEXCOORD), rather than one of each. This almost never makes a difference as the GPU is pretty much never bottlenecked by interpolation in any case, but for this particular shader, that was the perf limit!

  24. deadlydog says:

    Thanks for responding Shawn. I understand that you were following DirectX's lead with removing point sprite support and the PointList primitive type, but I wish it would've been kept around. I created and maintain DPSF (XNAParticles.com) which is for CPU-based particle systems. Even with the shader magic you mention, the same indexed triangle particle systems in XNA 4 typically run almost 2x slower than their old point sprite XNA 3.1 versions(non-rotating particles). Also, because we have to use 4 vertices per particle instead of 1 I quickly reach the max memory allowed in a single DrawUserPrimitives() call on the Xbox 360 (around 5000 particles), so we have to break it up into multiple Draw calls, which I still haven't been able to get working properly (forums.create.msdn.com/…/70300.aspx). Overall though I am very happy with XNA 4. Keep up the good work 🙂

  25. Garold says:

    Hi Shawn,

    I am using the Particles3D example as the template for my particle effects. It works great! However, I am trying to modify it to draw bullet holes that are aligned with the object surface hit. Is this the right way to go or do you recommend a different method? You can see the problem here: http://youtu.be/z4sNfcOxp1I

    Thanks