DrawIndirect performances

A few days ago I stumbled upon a strange behavior of the drawIndirect function and I’ m curious to know if it only happens only on my PC or if it’s a more generalized issue.

Currently in my engine I have a lot of objects drawn with drawInstanced(). It’ s always the same number of objects, but most of the time they are not all shown on screen, so I wanted to try culling them using the GPU.

The idea is simple, a compute shader will do the culling and build two buffers, one with the objects that are on screen, and the other with the parameters for a drawIndirect function.

Usually I prefer to progress step by step, so I started with a compute shader that only write the drawIndirect buffer, with the constant number of objects, and send that to the DrawInstancedIndirect() function.

So it wasn’t supposed to change anything, I just changed the DrawInstanced() for a DrawInstancedIndirect() with the same parameters. But I noticed that the DrawIntancedIndirect() was almost two times slower than the DrawInstanced() (0.6ms versus 1.1ms).

I spent some time trying to see what was wrong in my code, and after a while I tried it on my laptop with a NVIDIA 630m, and then, even if it was much slower, the timings were the same for both functions.

So I decided to try this in a smaller project.

I just took the tutorial showing how to draw a triangle from the DirectX SDK and changed the draw call. You can download it on github.

You can comment the lines 426/427 to use a draw command or the other, and you can change the number of vertices to draw by editing the line 44.

Here is the result on my AMD R9 290, using the 14.9 drivers:

For 900 000 vertices:

  • DrawInstanced: 0.31ms
  • DrawInstancedIndirect: 0.42ms

For 9 000 000 vertices:

  • DrawInstanced: 2.45ms
  • DrawInstancedIndirect: 4.56ms

On the NVIDIA GT 630m:

For 900 000 vertices:

  • DrawInstanced:2.72ms
  • DrawInstancedIndirect: 2.72ms

For 9 000 000 vertices:

  • DrawInstanced:26.87ms
  • DrawInstancedIndirect: 26.87ms

I was also able to test it on a GTX780, and there is no difference between the two functions.

I gathered some timings from several number of vertices and used all my Word skills to sumarize the results in a graph:

Timing in ms for the DrawInstanced and the DrawInstancedIndirect function for various number of vertices.
Timing in ms for the DrawInstanced and the DrawInstancedIndirect function for various number of vertices on a R9 290.

If somebody has a clue on why the drawIndirect function is slower on a (my ?) R9 290 I would be happy to hear it. Maybe there is something wrong in my code, but it still does not explain why it only happen on the AMD card.

I wasn’ t able to find anoter AMD card to test, so maybe it’ s just something wrong on my PC. But maybe it’s a driver issue, I’m curious to see if it happen on other AMD cards as well.

So if you have any suggestions or informations I’d be glad to hear them !

2 thoughts on “DrawIndirect performances

  1. I know this is an old post, but I’ve compiled this program myself and I don’t see a huge difference on my AMD GPU – R7 260x – between the two methods, both at 900 000 and 9 000 000 verts. Prolly got fixed in some driver update? I’ve got 14.12, Windows 8.1 x64.

    1. Hi ! Thanks for testing.
      On my side there is still the same difference, even with the latest drivers. Maybe it’s because these two cards have a different architecture, I don’t know.

  2. This doesn’t surprise me too much and in fact I’d expect a DrawIndirect that replaces a single Draw to slow things down. The key thing for DrawIndirect is whether or not a DrawIndirect that replaces 100 draw calls (for instance, drawing 100 different meshes) is a lot faster.

    When using DrawIndirect to draw 100 meshes, instead of making a call into the d3d runtime for each of them, you’re just going to be doing a series of *(outBuffer++) = blah operations with no calls to the d3d runtime. I wish I could find some benchmark measuring this type of performance 🙁

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.