Tag Archives: HLSL

Shadows using a voxel grid

Dynamic shadow casting point lights for tiled deferred rendering

A while ago, I started to experiment working with voxels. More precisely, my idea was to test what could be possible if we had our scene fully voxelized. Dynamic shadows is one of those tests.

For my tests I implemented a tiled deferred rendering engine, and one of the difficulties with tiled deferred is shadows. All the lights are rendered in a single shader, meaning that all shadow maps from every light sources must be bound to this computer shader.

The last years have seen a lot of techniques increasing the number of simultaneous dynamic light sources (deferred, clustered, tiled deferred, forward+), but always ignoring shadows. Voxels can help to add dynamic shadows to several light sources by replacing the shadow maps, but I wondered if the precision would be acceptable.


I described in a previous blog post the technique I used to dynamically voxelize a scene. I think there might be some ways to optimize this process, but that will be for an other blog post !

All the following screenshots and timmings are from a GTX 780, and the resolution is 1280×720. There is 32 point lights in the scene.

First of all, here what the voxelized scene looks like with a 256x256x256 grid:

Voxelized scene

And the scene without shadows:


The main idea is really simple. In the tiled deferred shader when computing the light, with the voxel structuring I am able to check if the current pixel is hidden from the light by something.

My main interest in a first time is to see how it could look, so I started with a straightforward raymarching, starting from the current pixel position to the point light. At each step I transform the world position into voxel grid position and check if the voxel is full or empty.

Here are the first result:

Shadow with voxels first result

The result depends on the number of steps, and of course the more steps you use, the slower it will be. In this screenshot there is 25 steps per raymarching.

Some lights are leaking, and the shadows are very harsh, the soft attenuation is removed because of the voxel size.

Let’s try with more steps, 75 :


No more light leaking, but it’s more costly, and the shadows are still very sharp, etiher present or not, it would be great to have some much subtle values.

During the last GDC, Michal Valient from Guerrilla Games gave a talk (the slides are available here: http://www.guerrilla-games.com/publications.html ) where he showed how to use a dither offset followed by a Gaussian blur to improve raymarching visual quality and reduce the number of steps needed. This is detailed in the chapter “Volumetric Light Effects in Killzone:Shadow Fall” in GPU Pro 5.

So here is the result when I added the dithered offset, with 25 steps:


With less steps, the result are far better. With the pseudo random offset the raymarching captures more details. But without applying any blur the noise induced by the dither pattern is very noticeable.

This was a little bit more complicated. It would be too expensive to raymarch the shadows for every samples needed for the blur, so this value must be stored somehow.

Few weeks ago I implemented SSAO with temporal sampling, as described by Bart Wronski here, soI tough I could do something in the same way, use the results of the previous frame to improve the current one.

The previous frame  informations must be stored in order to be able to know for a given pixel which point light is occluded or not, in order to apply the proper lighting. The results of the raymarching being either 0 or 1 ( the light is occluded or not) it takes only one bit to store the shadow information of a single light. A uint32 texture can store shadow informations for 32 lights per pixels. there could be more thant 32 lights, using an other data structure, but it’s a good start !

Once the raymarching is done, the result is stored in the texture like that:


uint currentLightFlag = 1 << CurrentLightIndex;

if (result)

g_PointLightShadows[CurrentCoord] |= currentLightFlag;



And to get the datas from a given light:



int previousShadow = (g_PreviousPointLightShadows[sampleCoords + motionOffset] & currentLightFlag) == currentLightFlag;


And the result:

Shadows using a voxel grid

I think it’s far better, the noise in the raymarching help to capture more details and smooth some of the voxels imprecisions, while the blur gives a nice soft shadow look.


I didn’t talk much about performances, because in this first test I was mainly interested by the image quality, and the implementation is really straightforward. For example the current blur implementation is very unoptimal, I sample directly in the “point light shadow map” for each lights. I could pre load the needed values only once per tile, store them in shared memory, for a quicker access.

Still, the timmings are not that bad. You can see that by looking at the “lighting” section, this is where the lighting and shadow casting happens. The shadows plus offset and blur add 4.7 ms. As it’s done in the tiled lighting pass, performances are linked to the number of lights seen on screen, and to the number of lights in a single tile.

That’s not that bad considering it’s 32 dynamic shadow casting point lights but of course it needs to have a voxel structure.


Here is an other comparison from a different point of view.

WithoutShadow2 WithShadow2

In this example the artifact due to the coarse voxel structure are noticeable.

I was able to reduce them using an offset at the begining of the raymarching, to remove some self shadowing pixels (I just noticed that it’s not exactly the same point of view) :



It’s a little better, but still something I’ll need to investigate.

I also tried to change the number of steps for the raymarching, to see which values would be the best compromise between performance and quality.

5 steps:



10 steps:10steps

15 steps:15steps

50 steps:50steps


5 steps are not enough, the shadows are missing where none of the steps hit a voxel. This is because with a grid size of 256x256x256 the voxels are quite small, and only the surface of the mesh is voxelized, and not the inside of the mesh.

With 10 steps there are still some issues. I think that  maybe a better blur could hide those imperfections, while smoothing the squared edges..

When there is too much steps, there is no holes in the shadows, but it give a very sharp result, making the voxels more noticeables. This values needs to be tweaked according to the scene, the grid size and the lights radius.

Here is a last test, using smaller grid sizes:



10 steps raymarching:128VoxelsGrid-10Steps


5 steps: 64VoxelGrid-5steps 

The smaller the grid is, the less steps it needs to have a correct result, because of the bigger voxels. But obviously the shadows are far less precise.

As I said this is a quick test, and there is room for improvments but I find the results quite encouraging. It can’t replace a shadow map for important lights or complicated shadows, but for it’s good enough for secondary lights.

Now the next step is to try other techniques and compare the results. With the proper mipmap, the voxel structure can be used as an octree, and I should be able to cast ray efficiently. It may be quicker and more precise thant raymarching. I also want to try shadows with voxel cone tracing.

I would also like to try to do the raymarching/raycansting/voxel cone tracing later in the frame. In the current implementation the number of raymarching for a pixel depend on the number of lights hitting that pixel. Some pixels will need 5 raycasts while others won’t need a single one. It would be better if instead of doing the raymarching it creates some sort of “GPU raymarching job”, and those jobs would be done later, equally distributed within the threads.

So there will be more blog post on this subject ! I will also try to do a video, because it looks better in movement.

GPU Particles

English version is coming soon !

Une première vidéo pour montrer et expliquer le fonctionnement de base de mon moteur de particules.

Tous les calculs de mise à jour, physique et collisions s’exécutent sur le GPU, ce qui permet d’avoir de bonnes performances pour un grand nombre de particules (ici 1 000 000 de particules, locké à 30 fps pour les besoin de l’enregistrement).

Toutes les informations dynamiques des particules (position X et Y dans les canaux RG et velocité X et Y dans les canaux BA) sont stockées dans une texture (ici 1024×1024) Chaque particule est identifié par un ensemble de trois vertices. A la place de leur position est stocké une coordonnée de texture, qui permet de retrouvé les informations dans la texture contenant les données.

La mise à jour se déroule en deux temps. Tout d’abord il y a une phase de mise à jour de la physique. En dessinant un quad fullscreen, pour chaque pixel de la texture de données on extrait les informations de la frame précédente afin d’en déduire celles de la frame courante, en fonction de la gravité, des collisions, des forces externes, etc. Ensuite vient la phase d’affichage. On envoie à la carte les vertices représentant chaque particule, et dans le vertex shader, grâce au Vertex Texture Fetching et aux UVs, on retrouve la position réelle ce qui permet d’afficher un triangle au bon endroit.

On peut voir dans la vidéo l’influence d’une force d’attraction contrôlée par la souris et celle de la gravité. Il n’y a de collisions qu’avec le bord de l’écran. La couleur des particules peut être soit fixe, soit influencée par leur vélocité. On voit aussi un post process qui dessine une couleur en fonction de la densité des particules, donnant un aspect “fluide”.

Dans la prochaine vidéo je montrerai les collisions avec des objets dynamiques, ainsi que l’utilisation de flowmaps pour influencer le mouvement de toutes les particules.

Le code source est disponible sur github.

Le setup du projet est téléchargeable ici.