A few weeks ago I decided to start using PBR in my engine. I used the same equations as in my PBR viewer, so the implementation was not relly comlicated. The part that took me most of the time was creating the Base Color, Normal, Roughness and Metallic textures for each materials in Sponza, so I thought it might worth sharing it if it can save someone some time. I used the textures provided as base, and created the missing ones with Substance Designer. So that’s why I should use “PBR” with quotes, the textures are far from being calibrated or scanned, it was mainly made to look ok and being able to test a quick PBR environnement. And I’m not an artist, so It may be better to consider this as “programmer art”.
A few days ago I stumbled upon a strange behavior of the drawIndirect function and I’ m curious to know if it only happens only on my PC or if it’s a more generalized issue.
Currently in my engine I have a lot of objects drawn with drawInstanced(). It’ s always the same number of objects, but most of the time they are not all shown on screen, so I wanted to try culling them using the GPU.
The idea is simple, a compute shader will do the culling and build two buffers, one with the objects that are on screen, and the other with the parameters for a drawIndirect function.
Usually I prefer to progress step by step, so I started with a compute shader that only write the drawIndirect buffer, with the constant number of objects, and send that to the DrawInstancedIndirect() function.
So it wasn’t supposed to change anything, I just changed the DrawInstanced() for a DrawInstancedIndirect() with the same parameters. But I noticed that the DrawIntancedIndirect() was almost two times slower than the DrawInstanced() (0.6ms versus 1.1ms).
I spent some time trying to see what was wrong in my code, and after a while I tried it on my laptop with a NVIDIA 630m, and then, even if it was much slower, the timings were the same for both functions.
So I decided to try this in a smaller project.
I just took the tutorial showing how to draw a triangle from the DirectX SDK and changed the draw call. You can download it on github.
You can comment the lines 426/427 to use a draw command or the other, and you can change the number of vertices to draw by editing the line 44.
Here is the result on my AMD R9 290, using the 14.9 drivers:
For 900 000 vertices:
- DrawInstanced: 0.31ms
- DrawInstancedIndirect: 0.42ms
For 9 000 000 vertices:
- DrawInstanced: 2.45ms
- DrawInstancedIndirect: 4.56ms
On the NVIDIA GT 630m:
For 900 000 vertices:
- DrawInstancedIndirect: 2.72ms
For 9 000 000 vertices:
- DrawInstancedIndirect: 26.87ms
I was also able to test it on a GTX780, and there is no difference between the two functions.
I gathered some timings from several number of vertices and used all my Word skills to sumarize the results in a graph:
If somebody has a clue on why the drawIndirect function is slower on a (my ?) R9 290 I would be happy to hear it. Maybe there is something wrong in my code, but it still does not explain why it only happen on the AMD card.
I wasn’ t able to find anoter AMD card to test, so maybe it’ s just something wrong on my PC. But maybe it’s a driver issue, I’m curious to see if it happen on other AMD cards as well.
So if you have any suggestions or informations I’d be glad to hear them !
This is extremely useful. More than once I launched debugging from Visual Studio, found a weird bug, and had to launch it again from RenderDoc, trying to reproduce the bug.
So I looked at the source code uploaded on Github by Temaran, removed the UE4 related code and only kept a single class to be able to load RenderDoc and trigger a capture directly from my engine.
You can find the code here: https://github.com/oks2024/RenderDoc-Manager
In the end it’s just two header and one cpp file. You just have to provide some paths, like where you want to store the captures and your RenderDoc folder. In my case I use the portable version and stored it in Perforce. Keep in mind that your build target must match the RenderDoc version, you can’t mix x86 and 64bits.
You can either bind a key to RenderDoc that will trigger a capture or use the StartFrameCapture()/
It’s working great in my small engine, I’m using it for a couple of days and hadn’t noticed any issue. I know that it can slow down the resources creation so for a bigger engine it’s not be something you want to have always enabled.
As you can see it’s very a basic code skeleton, and most of the code is from Temaran’s plugin, but I found it really usefull and thought it worth sharing.
I think I will add functions (at least to set capture options without having to recompile) as I need them, and I will try to keep the Github repository up to date. And if you have any suggestion, it’s on github, so feel free to leave a comment or add your modifications :).
Apart from debugging, it may also be usefull for creating automated tests. With the appropriate script you can load a level, move the camera through several positions, and take captures. And after that it should be possible to get the images (and maybe even timings) from the captures, and compare them to make sure your last submit did not break the rendering.
Baldur Karlsson is doing an amazing work on RenderDoc, with regular updates and new features. I was already one of my favorite tool, and it’s not going to change !
To be able to implement volumetric lights I had to start with shadows for the sun light. As it wasn’t my primary focus I went for a straightforward shadow map implementation on a 2k texture. It’s easy and quick to code, but yeah, the results are ugly.
I didn’t planned to implement a more advanced shadow map technique anytime soon, but I also didn’t want to stay with those ultra pixaleted shadows, so I tried to apply my two current favorite techniques: dithering and temporal supersampling :).
Let see what it can do !
While waiting for a new computer that will make my experiments with voxels more comfortable (even a 64x64x64 grid is slow on my laptop) I decided to try some less expensive effects, starting with the volumetric lights as described in GPU Pro 5 by Nathan Vos from Guerilla Games.
A while ago, I started to experiment working with voxels. More precisely, my idea was to test what could be possible if we had our scene fully voxelized. Dynamic shadows is one of those tests.
For my tests I implemented a tiled deferred rendering engine, and one of the difficulties with tiled deferred is shadows. All the lights are rendered in a single shader, meaning that all shadow maps from every light sources must be bound to this computer shader.
The last years have seen a lot of techniques increasing the number of simultaneous dynamic light sources (deferred, clustered, tiled deferred, forward+), but always ignoring shadows. Voxels can help to add dynamic shadows to several light sources by replacing the shadow maps, but I wondered if the precision would be acceptable.
I described in a previous blog post the technique I used to dynamically voxelize a scene. I think there might be some ways to optimize this process, but that will be for an other blog post !
All the following screenshots and timmings are from a GTX 780, and the resolution is 1280×720. There is 32 point lights in the scene.
First of all, here what the voxelized scene looks like with a 256x256x256 grid: