Tag Archives: NVIDIA

DrawIndirect performances

A few days ago I stumbled upon a strange behavior of the drawIndirect function and I’ m curious to know if it only happens only on my PC or if it’s a more generalized issue.

Currently in my engine I have a lot of objects drawn with drawInstanced(). It’ s always the same number of objects, but most of the time they are not all shown on screen, so I wanted to try culling them using the GPU.

The idea is simple, a compute shader will do the culling and build two buffers, one with the objects that are on screen, and the other with the parameters for a drawIndirect function.

Usually I prefer to progress step by step, so I started with a compute shader that only write the drawIndirect buffer, with the constant number of objects, and send that to the DrawInstancedIndirect() function.

So it wasn’t supposed to change anything, I just changed the DrawInstanced() for a DrawInstancedIndirect() with the same parameters. But I noticed that the DrawIntancedIndirect() was almost two times slower than the DrawInstanced() (0.6ms versus 1.1ms).

I spent some time trying to see what was wrong in my code, and after a while I tried it on my laptop with a NVIDIA 630m, and then, even if it was much slower, the timings were the same for both functions.

So I decided to try this in a smaller project.

I just took the tutorial showing how to draw a triangle from the DirectX SDK and changed the draw call. You can download it on github.

You can comment the lines 426/427 to use a draw command or the other, and you can change the number of vertices to draw by editing the line 44.

Here is the result on my AMD R9 290, using the 14.9 drivers:

For 900 000 vertices:

  • DrawInstanced: 0.31ms
  • DrawInstancedIndirect: 0.42ms

For 9 000 000 vertices:

  • DrawInstanced: 2.45ms
  • DrawInstancedIndirect: 4.56ms

On the NVIDIA GT 630m:

For 900 000 vertices:

  • DrawInstanced:2.72ms
  • DrawInstancedIndirect: 2.72ms

For 9 000 000 vertices:

  • DrawInstanced:26.87ms
  • DrawInstancedIndirect: 26.87ms

I was also able to test it on a GTX780, and there is no difference between the two functions.

I gathered some timings from several number of vertices and used all my Word skills to sumarize the results in a graph:

Timing in ms for the DrawInstanced and the DrawInstancedIndirect function for various number of vertices.

Timing in ms for the DrawInstanced and the DrawInstancedIndirect function for various number of vertices on a R9 290.

If somebody has a clue on why the drawIndirect function is slower on a (my ?) R9 290 I would be happy to hear it. Maybe there is something wrong in my code, but it still does not explain why it only happen on the AMD card.

I wasn’ t able to find anoter AMD card to test, so maybe it’ s just something wrong on my PC. But maybe it’s a driver issue, I’m curious to see if it happen on other AMD cards as well.

So if you have any suggestions or informations I’d be glad to hear them !