Monthly Archives: August 2014


Volumetric lights

While waiting for a new computer that will make my experiments with voxels more comfortable (even a 64x64x64 grid is slow on my laptop) I decided to try some less expensive effects, starting with the volumetric lights as described in GPU Pro 5 by Nathan Vos from Guerilla Games.

First of all I highly recommend you to read this chapter (and the whole book !), there is a lot of useful tricks, and I only scratch the surface. There is also the excellent GDC talk “Taking Killzone Shadow Fall Image Quality into the Next Generation” by Michal Valient. You can download the slides on the Guerrilla Games publication page. It’s not just about volumetric lights, but the whole talk is really interesting (especially the part on the nice trick to achieve 60fps for multiplayer using temporal reprojection, I found it particularly awesome. But be careful, nowadays you can be sued for being awesome …).

A similar approch has been use in the upcoming game “Lords of the Fallen” and is described by Benjamin Glatzel for the conference Digital Dragon. You can see the slides here and the presentation here, it’s a really good source of informations.

I started with a directional light, the sun light in my case, that will affect the whole scene.
First of all I needed some kind of shadow map. I could have created one by ray marching in my voxel grid as I’ve done with point lights. As there is only one light it’s easy to store the distance from the occluder. But again my laptop GPU is really slow and it would have been annoying, so I started with a really basic shadow map implementation.

The idea is to “ray march” from the world position of the current pixel to the camera position, and for each step check if the current position can be seen by the light or not, using the informations from the shadow map. For each step the light scattered in the camera direction is accumulated, using the Henyey-Greenstein phase function.

It’s really simple, but as often with raymarching it can quickly become expensive as it require a certain amount of rays to capture details, and that’s why the ray marching is done in a downscaled texture.

The pseudo code for the raymarching looks like this:

// Mie scaterring approximated with Henyey-Greenstein phase function.
float ComputeScattering(float lightDotView)
float result = 1.0f - G_SCATTERING * G_SCATTERING;
result /= (4.0f * PI * pow(1.0f + G_SCATTERING * G_SCATTERING - (2.0f * G_SCATTERING) *      lightDotView, 1.5f));
return result;

float3 worldPos = getWorldPosition(input.TexCoord);
float3 startPosition = g_CameraPosition;

float3 rayVector = startPosition;

float rayLength = length(rayVector);
float3 rayDirection = rayVector / rayLength;

float stepLength = rayLength / NB_STEPS;

float3 step = rayDirection * stepLength;

float3 currentPosition = startPosition;

float3 accumFog =;

for (int i = 0; i < NB_STEPS; i++)
float4 worldInShadowCameraSpace = mul(float4(currentPosition, 1.0f), g_ShadowViewProjectionMatrix);
worldInShadowCameraSpace /= worldInShadowCameraSpace.w;

float shadowMapValue = shadowMap.Load(uint3(shadowmapTexCoord, 0)).r;

if (shadowMapValue > worldByShadowCamera.z)
accumFog += ComputeScattering(dot(rayDirection, sunDirection)).xxx * g_SunColor;

currentPosition += step;
accumFog /= NB_STEPS;

Here is the result of the raw ray marching with 100 steps:

Volumetric lights100 steps

It looks pretty good, but even with an halved resolution it’s still 7.6 ms on my Geforce 630m.

Let’s see what it looks like with a lot less, 10,  ray marching steps :

Volumetric lights 10 steps

Yeah, it’s pretty ugly.

But as with the ray marched shadows from my previous post, a bayer matrix will add some noise, and help to capture more details with less samples.

ditherPattern[4][4] = {{ 0.0f, 0.5f, 0.125f, 0.625f},
{ 0.75f, 0.22f, 0.875f, 0.375f},
{ 0.1875f, 0.6875f, 0.0625f, 0.5625},
{ 0.9375f, 0.4375f, 0.8125f, 0.3125}};

// Offset the start position.
startPosition += step * ditherValue;

Volumetric lights dither pattern

And the next step is a bilateral blur in order to have a smooth result.

Volumetric light Blur

I always find the effect of the noise + blur amazing for ray marching !

The main issue with downsampling is that when you apply the result to the full resolution scene all the edges are blurry and/or pixelated, as you can see in the screenshot. This is fixed in the last step, a bilateral upsampling.
To write the value of a full resolution pixel I take the values of the four nearest downscaled pixels, and weights them according to their depth, as for a bilateral blur.
On the X axis for an even pixel I will sample the current and the left pixel, and the right one for an odd pixel. The same apply for the Y axis.


float4 main(const PS_INPUT input) : SV_TARGET

float upSampledDepth = depth.Load(int3(screenCoordinates, 0)).x;

float3 color =;
float totalWeight = 0.0f;

// Select the closest downscaled pixels.

int xOffset = screenCoordinates.x % 2 == 0 ? -1 : 1;
int yOffset = screenCoordinates.y % 2 == 0 ? -1 : 1;

int2 offsets[] = {int2(0, 0),
int2(0, yOffset),
int2(xOffset, 0),
int2(xOffset, yOffset)};

for (int i = 0; i < 4; i ++)

float3 downscaledColor = volumetricLightTexture.Load(int3(downscaledCoordinates + offsets[i], 0));

float downscaledDepth = depth.Load(int3(downscaledCoordinates, + offsets[i] 1));

float currentWeight = 1.0f;
currentWeight *= max(0.0f, 1.0f - (0.05f) * abs(downscaledDepth - upSampledDepth));

color += downscaledColor * currentWeight;
totalWeight += currentWeight;


float3 volumetricLight;
const float epsilon = 0.0001f; = color/(totalWeight + epsilon);

return float4(, 1.0f);



And here is the final result :


Volumetric light final result


Some screenshots from other points of view, with 15 steps:

VolumetricLights3 VolumetricLights2 VolumetricLights VolumetricLights4


I’m pretty happy with the results so far. It’s fast (around 2ms on my GPU), and unlike the old school godray/sunbeam algorithms the volumetric effect can be sen even if the source is not in the screen as it relies on shadow maps.

For now, in my implementation the fog density is uniform. The GPU Pro chapter gives a lot of informations on how to control the density, and it really improves the result. My idea is to use the empty voxels of my grid to store density values. Then I should be able to update those values, with things like fog emitters, wind, collisions, etc. I tried with a 64x64x64 grid but the precision is not good enough, so I’ll try with a larger grid size.

For an alternative technique you can have a look at “Volumetric Fog: Unified compute shader based solution to atmospheric scattering”, used in Assassin’s Creed 4 and described by Bart Wronski at SIGGRAPH (slides are available here ). It’s more complicated, but it may be faster (especially for multiple lights) and using an intermediate 3D texture is really interesting as it allow to do more advanced calculations.

As I said at the beginning it’s just a start, and yet I find it already improves the sensation of depth of the scene. It reinforces the impact of lights as they no longer only impact the geometry. And with a low number of steps the cost it’s fast enough, so it’s really something I want to deepen !

Shadows using a voxel grid

Dynamic shadow casting point lights for tiled deferred rendering

A while ago, I started to experiment working with voxels. More precisely, my idea was to test what could be possible if we had our scene fully voxelized. Dynamic shadows is one of those tests.

For my tests I implemented a tiled deferred rendering engine, and one of the difficulties with tiled deferred is shadows. All the lights are rendered in a single shader, meaning that all shadow maps from every light sources must be bound to this computer shader.

The last years have seen a lot of techniques increasing the number of simultaneous dynamic light sources (deferred, clustered, tiled deferred, forward+), but always ignoring shadows. Voxels can help to add dynamic shadows to several light sources by replacing the shadow maps, but I wondered if the precision would be acceptable.


I described in a previous blog post the technique I used to dynamically voxelize a scene. I think there might be some ways to optimize this process, but that will be for an other blog post !

All the following screenshots and timmings are from a GTX 780, and the resolution is 1280×720. There is 32 point lights in the scene.

First of all, here what the voxelized scene looks like with a 256x256x256 grid:

Voxelized scene

And the scene without shadows:


The main idea is really simple. In the tiled deferred shader when computing the light, with the voxel structuring I am able to check if the current pixel is hidden from the light by something.

My main interest in a first time is to see how it could look, so I started with a straightforward raymarching, starting from the current pixel position to the point light. At each step I transform the world position into voxel grid position and check if the voxel is full or empty.

Here are the first result:

Shadow with voxels first result

The result depends on the number of steps, and of course the more steps you use, the slower it will be. In this screenshot there is 25 steps per raymarching.

Some lights are leaking, and the shadows are very harsh, the soft attenuation is removed because of the voxel size.

Let’s try with more steps, 75 :


No more light leaking, but it’s more costly, and the shadows are still very sharp, etiher present or not, it would be great to have some much subtle values.

During the last GDC, Michal Valient from Guerrilla Games gave a talk (the slides are available here: ) where he showed how to use a dither offset followed by a Gaussian blur to improve raymarching visual quality and reduce the number of steps needed. This is detailed in the chapter “Volumetric Light Effects in Killzone:Shadow Fall” in GPU Pro 5.

So here is the result when I added the dithered offset, with 25 steps:


With less steps, the result are far better. With the pseudo random offset the raymarching captures more details. But without applying any blur the noise induced by the dither pattern is very noticeable.

This was a little bit more complicated. It would be too expensive to raymarch the shadows for every samples needed for the blur, so this value must be stored somehow.

Few weeks ago I implemented SSAO with temporal sampling, as described by Bart Wronski here, soI tough I could do something in the same way, use the results of the previous frame to improve the current one.

The previous frame  informations must be stored in order to be able to know for a given pixel which point light is occluded or not, in order to apply the proper lighting. The results of the raymarching being either 0 or 1 ( the light is occluded or not) it takes only one bit to store the shadow information of a single light. A uint32 texture can store shadow informations for 32 lights per pixels. there could be more thant 32 lights, using an other data structure, but it’s a good start !

Once the raymarching is done, the result is stored in the texture like that:


uint currentLightFlag = 1 << CurrentLightIndex;

if (result)

g_PointLightShadows[CurrentCoord] |= currentLightFlag;



And to get the datas from a given light:



int previousShadow = (g_PreviousPointLightShadows[sampleCoords + motionOffset] & currentLightFlag) == currentLightFlag;


And the result:

Shadows using a voxel grid

I think it’s far better, the noise in the raymarching help to capture more details and smooth some of the voxels imprecisions, while the blur gives a nice soft shadow look.


I didn’t talk much about performances, because in this first test I was mainly interested by the image quality, and the implementation is really straightforward. For example the current blur implementation is very unoptimal, I sample directly in the “point light shadow map” for each lights. I could pre load the needed values only once per tile, store them in shared memory, for a quicker access.

Still, the timmings are not that bad. You can see that by looking at the “lighting” section, this is where the lighting and shadow casting happens. The shadows plus offset and blur add 4.7 ms. As it’s done in the tiled lighting pass, performances are linked to the number of lights seen on screen, and to the number of lights in a single tile.

That’s not that bad considering it’s 32 dynamic shadow casting point lights but of course it needs to have a voxel structure.


Here is an other comparison from a different point of view.

WithoutShadow2 WithShadow2

In this example the artifact due to the coarse voxel structure are noticeable.

I was able to reduce them using an offset at the begining of the raymarching, to remove some self shadowing pixels (I just noticed that it’s not exactly the same point of view) :



It’s a little better, but still something I’ll need to investigate.

I also tried to change the number of steps for the raymarching, to see which values would be the best compromise between performance and quality.

5 steps:



10 steps:10steps

15 steps:15steps

50 steps:50steps


5 steps are not enough, the shadows are missing where none of the steps hit a voxel. This is because with a grid size of 256x256x256 the voxels are quite small, and only the surface of the mesh is voxelized, and not the inside of the mesh.

With 10 steps there are still some issues. I think that  maybe a better blur could hide those imperfections, while smoothing the squared edges..

When there is too much steps, there is no holes in the shadows, but it give a very sharp result, making the voxels more noticeables. This values needs to be tweaked according to the scene, the grid size and the lights radius.

Here is a last test, using smaller grid sizes:



10 steps raymarching:128VoxelsGrid-10Steps


5 steps: 64VoxelGrid-5steps 

The smaller the grid is, the less steps it needs to have a correct result, because of the bigger voxels. But obviously the shadows are far less precise.

As I said this is a quick test, and there is room for improvments but I find the results quite encouraging. It can’t replace a shadow map for important lights or complicated shadows, but for it’s good enough for secondary lights.

Now the next step is to try other techniques and compare the results. With the proper mipmap, the voxel structure can be used as an octree, and I should be able to cast ray efficiently. It may be quicker and more precise thant raymarching. I also want to try shadows with voxel cone tracing.

I would also like to try to do the raymarching/raycansting/voxel cone tracing later in the frame. In the current implementation the number of raymarching for a pixel depend on the number of lights hitting that pixel. Some pixels will need 5 raycasts while others won’t need a single one. It would be better if instead of doing the raymarching it creates some sort of “GPU raymarching job”, and those jobs would be done later, equally distributed within the threads.

So there will be more blog post on this subject ! I will also try to do a video, because it looks better in movement.