Advanced Real-Time Per-Pixel Lighting in OpenGL
using nVidia Register Combiners
by Ronald Frazier (ldkronos@mediaone.net)


In my last paper "Real-Time Per-Pixel Point Lights and Spot Lights in OpenGL using nVidia Register Combiners", I discussed a technique for rendering realistic per-pixel point lights and spot lights. At the end of the paper, I discussed some techniques that could be used to improve upon the lights developed there. In this paper I will demonstrate how to implement all of these techniques to create even more realistic lighting. In particular, I will take the spot light developed there and expand it to incorporate diffuse bump mapping, specular bump mapping, and gloss mapping. Then I will demonstrate two techniques for adding realistic shadowing. The end result will be an extremely realistic lighting method. As the techniques developed here are derived from those developed in the previous paper, I recommend reading that paper if you have not already done so. Additionally, the mathematics used in these advanced lighting techniques are quite a bit more complex, so many of the formulas will only be explained briefly. Therefore it will be helpful to have an understanding of these mathematical theories before reading on. If you are not already familiar with the mathematics of lighting, an excellent paper to read is "A Practical and Robust Bump-mapping Technique for Today’s GPUs" by Mark Kilgard. This paper is available in the developers section of the nVidia web site (http://www.nvidia.com/developer). Finally, since the code for much of this lighting is actually quite large, no code is being presented here. However, as before, you can download a zip file containing all source code, executables, graphics, and documentation. The link to download this file is at the bottom of this paper.

Using the Alpha Buffer
Because many of the advanced techniques discussed here are quite complex, they often require more textures and more general combiners than can be implement in a single pass on current hardware. Therefore, we need a way to spread the lighting calculation across multiple passes without affecting the ability to render multiple overlapping lights. To do so, we will make use of the alpha buffer which is provided by the GeForce series graphics card when the display is set to 32-bit color mode. Since the alpha buffer is otherwise unused, and its contents in no way effect how the user sees the rendered geometry, we can safely use the alpha buffer however we see fit.

To demonstrate how the alpha buffer can be used to our advantage, lets demonstrate how we can render a complex equation in multiple passes. For demonstration, assume we want to display the results of the equation A*B*C*D*color1 + A*B*E*F*color2, where A, B, C, D, E, and F are luminance values stored in separate textures, and color1 and color2 are RGB values stored in 2 addition textures. We could render this equation on current GeForce hardware using the following steps:

Step 1: Clear the color buffer to black.
Step 2: Disable color buffer writes, calculate A*B, and render it to the alpha buffer using the GL_ONE : GL_ZERO blending function.
Step 3: Disable color buffer writes, calculate C*D and render it to the alpha buffer using the GL_DST_ALPHA : GL_ZERO blending function.
Step 4: Enable color buffer writes, and render color1 to the color buffer using the GL_DST_ALPHA : GL_ONE blending function.
Step 5: Disable color buffer writes, calculate A*B, and render it to the alpha buffer using the GL_ONE : GL_ZERO blending function.
Step 6: Disable color buffer writes, calculate E*F and render it to the alpha buffer using the GL_DST_ALPHA : GL_ZERO blending function.
Step 7: Enable color buffer writes, and render color2 to the color buffer using the GL_DST_ALPHA : GL_ONE blending function.

The end result is that we have rendered the complex equation A*B*C*D*color1 + A*B*E*F*color2 to the color buffer. This technique can be extended to any degree to calculate nearly any equation provided that each addition term is composed of no more than 2 RGB texture values multiplied together.

Calculating Distance Attenuation
One of the features that was discussed previously was how to incorporate distance attenuation into the lighting equation. In developing the point lights, we used a simplistic approach that used 2 textures to calculate distance. When we implemented the spot light, we found a need to simplify this equation down to using only a single texture unit. The way we achieved this was to "rotate" the polygon to be parallel with the Z=0 plane (essentially, we moved the polygon into light space). While this did work in simplifying the equation, it also required us to either perform a per-vertex transformation on the CPU, or it require us to reset the GPU texture matrix for each polygon we rendered. Neither the point light solution or the spot light solutions was 100% desirable.

Another solution is to move the light into the tangent space of the polygon, and use the tangent space light vector as both the z-distance value and as the s and t coordinates for the 2D radial attenuation map. This actually turns out to be rather convenient because we will need the tangent space light vector for our diffuse and specular bump mapping anyway, so its almost like getting it for free.

While we are calculating attenuation, another factor that we should add to the calculation is a self-shadowing term. Previously, the point lights and spot lights illuminated a surface without regard as to whether the light was in front of or behind the plane of the polygon. Therefore, it was up to us to explicitly cull out any polygons that faced away from the light. However, by multiplying the attenuated light value by 8*(N dotProduct L), where N is the surface normal of the polygon and L is the tangent space light vector, and then clamping this value to the 0 to 1 range, we will set the light brightness to zero if the light lies behind the polygon. Additionally, note that when the light is in front of the polygon at a distance between 0 and 0.125, we get a linear scaling of the light brightness. This helps to reduces pixel popping when the light gets extremely close to the polygon and we are performing bump mapping.

Diffuse Lighting
The lighting methods discussed previously illuminated surfaces uniformly (with the exception of the distance attenuation). This is perfectly acceptable for illuminating perfectly flat surfaces. However, many surfaces in real life have a bumpy texture. With a textured surface, the parts of the surface that face the light should be illuminated more than those facing away from the light. In order to add this effect, the first thing we will do is to add diffuse bump mapping.

Diffuse Bump-Mapped Lighting
With diffuse bump mapping, we need to scale the light contribution based on the angle between the perturbed surface normal and the tangent space light vector. The perturbed surface normal is a per-pixel tangent space normal vector that is stored in the bump map (also called the normal map). The light vector is the normalized tangent space vector from the surface to the  light. Since the light vector varies per pixel, and because it varies base on the position of the light, it would not be reasonable to try to represent the light vector in a texture. Instead we will specify the light vector on a per vertex basis and let the GPU interpolate it across the surface. The only problem with this is that as light gets closer to the polygon surface, the interpolated light vector will become more and more unnormalized (it will be shortened). The result will be that, as the light approaches the surface, the surface will actually be less illuminated than when the light is further away. To solve this problem, we will use a normalization cube map to generate this vector. A normalization cube map is designed so that, given a texture coordinate representing a 3D vector, the output will always be the normalized vector.

Now that we have the perturbed tangent space surface normal (N') in one texture, and we can retrieve the tangent space light vector (L) from a second texture, we can now calculate the bumped map diffuse lighting term as:

                 N' dotProduct L

We can then multiply the per pixel light value by the Bumped Diffuse Term to generate the bump mapped per pixel lighting value.

Combined Diffuse Lighting Equation
To create a total diffuse lighting value, we need to combine everything into the following equation:

                 Diffuse = A * Sself * (N' dotProduct L) * C * F * D

where:

                 A = Distance attenuation
                 Sself = Self shadowing term
                 N' = Perturbed surface normal
                 L = Tangent space light vector
                 C = Light color
                 F = Color filter cubemap
                 D = Diffuse material color

You should recognized all of the above terms except for F (the color filter cubemap). This is just the cube map the we used to filter a point light into a spot light. Also note that all of the above values come from textures except for Sself (which is provided in the per vertex primary color) and C (which is a per light constant color). To apply this formula, we can use the multi-step alpha buffer technique described above. Before beginning, we need to ensure that the color buffer is filled with some starting value (black, the ambient lighted scene, or even the scene already lighted by one or more lights) and that the depth buffer is filled with the depth values of the closest polygon for each pixel and the depth test is set to GL_EQUAL . We need to fill the depth buffer so that we ensure that we only draw the closest polygon for each pixel. Otherwise we could end up incorrectly scaling the alpha values when we draw occluded geometry. Then, using the depth test function:

Pass 1: Disable color buffer writes, calculate A * Sself and render it to the alpha buffer using the GL_ONE : GL_ZERO blending function.
Pass 2: Disable color buffer writes, calculate N' dotProduct L and render it to the alpha buffer using the GL_DST_ALPHA : GL_ZERO blending function.
Pass 3: Enable color buffer writes, calculate C * F * D and render it to the color buffer using the GL_DST_ALPHA : GL_ONE blending function.

The below images show the effects of a point light with and without diffuse bump mapping. Notice how the addition of the bump mapping make the entire environment more realistic. The steel floor has a roughness, the walls are smooth but have a raised outer edge, and the wood ceiling has a textured grain. Finally, notice that the light becomes dimmer as the angle between the light vector and the surface normal increases.

diffuse_nobump.jpg (95929 bytes)
scene without diffuse bump mapping

diffuse_bump.jpg (135737 bytes)
scene with diffuse bump mapping

Specular Lighting
The next feature to add is specular lighting. In real life, objects don't reflect light uniformly at every angle. Most surfaces, when viewed from the correct angle, have a shiny appearance. This shiny appearance is the results of specular highlights. Adding specular lighting is actually quite simple once diffuse lighting is done.

Specular Bump-Mapped Lighting
Rendering specular lighting can be done as an additional additive rendering pass. Essentially we re-render the light almost exactly the same as before. However, this time, in our second pass, instead of calculating the diffuse lighting term (N' dotProduct L), we calculate the specular lighting term (N' dotProduct H)p, where p is the specular exponent. As before, N' is the perturbed surface normal taken from the normal map. H is the half angle vector, which is the normalized sum of the light vector L and the view vector V (view vector is the vector from the surface to the eye position). Like we did before with the light vector L, we will specify H on a per vertex basis and use a normalization cube map to normalize the interpolated H vector. In addition, we also have to raise the resulting dot product to the power p. The specular exponent p modifies the shininess of the surface and the size of the specular highlight. The larger we make p, the smaller and sharper the specular highlight will appear.

Unfortunately, because the GeForce only has 2 general combiners, and because we want to output the resulting specular value to the alpha buffer, we are limited to a maximum specular exponent of 2. This isn't very shiny at all. In fact, the specular highlight would be so large and gradual that you almost wouldn't even notice it. Instead, it would just look like the surface was illuminated by a brighter light. So, in order to create a more noticeable specular highlight, we are going to have to sacrifice some accuracy of the lighting model and come up with a close approximation of a higher exponent. If we use the formula 4*((N' dotProduct L)2 - 0.75), we come up with a reasonably close approximation of a specular exponent of 16. The graph below demonstrates the resulting specular values over the (N' dotProduct L) range of 0 to 1 with the values of p=2 and p=16, along with our approximation. As you can see, the approximation is more than reasonable and is much better than the maximum value of p=2 that we could calculate and output to the alpha channel.

exponent.gif (2105 bytes)

The other change that needs to be made to the diffuse lighting equation is to replace the material diffuse color texture with the material specular color texture. The reason for using different diffuse and specular material color texture is because often a material has different diffuse and specular colors. For instance, a polished wood floor would have a brown color but would often have white specular highlights.

The following images once again show diffuse bump mapping, but the second image also includes specular bump mapped lighting added to the scene. Notice the shininess of the floor at certain angles. Also notice that on the shiny walls the tiles start to saturate to white within the specular highlights.

no_specular.jpg (75613 bytes)
scene with only diffuse lighting

specular.jpg (88661 bytes)
scene with diffuse and specular lighting

no_gloss.jpg (77009 bytes)
shiny cube

Gloss Mapping
Notice in the picture of the cube above that the specular highlight covers the entire surface. However, not all materials are entirely shiny. Some materials are shiny on some parts, dull on others. The above cube has a diffuse material texture that gives the appearance of partially rusted steel. It would be nice if we could modify the specular highlight so that they only appear on the unrusted parts of the surface.

To accomplish this, we will use gloss mapping. This allows us to multiply the per-pixel specular value by a per-pixel gloss value. In order to implement gloss mapping, we need to scale the specular value by the gloss map value. It doesn't matter at what step we do so, but notice that our first step (distance attenuation and self shadowing) only uses 1 texture so far. This is the perfect place to add in a gloss map texture as it will not require any additional passes.

The following image shows the cube again, but this time using a gloss map. Notice how the rusty part of the surface never appears shiny.

gloss.jpg (73187 bytes)
cube with gloss map

Combined Specular Lighting Equation
To create a total specular lighting value, we need to combine everything into the following equation:

                 Specular = A * Sself * G * 4 * ((N' dotProduct H)2 - 0.75) * C * F * S

where:

                 A = Distance attenuation
                 Sself = Self shadowing term
                 G = Gloss map
                 N' = Perturbed surface normal
                 H = Tangent space half angle vector
                 C = Light color
                 F = Color filter cubemap
                 S = Specular material color

To apply this formula, we can use the multi-step alpha buffer technique described above. Before beginning, we once again need to ensure that the color buffer is filled with some starting value (probably the ambient and diffuse lighting values) and that the depth buffer is filled with the depth values of the closest polygon for each pixel. Then, using the GL_EQUAL depth test function:

Pass 1: Disable color buffer writes, calculate A * Sself * G, and render it to the alpha buffer using the GL_ONE : GL_ZERO blending function.
Pass 2: Disable color buffer writes, calculate 4 * ((N' dotProduct H)2 - 0.75) and render it to the alpha buffer using the GL_DST_ALPHA : GL_ZERO blending function.
Pass 3: Enable color buffer writes, calculate C * F * S and render it to the color buffer using the GL_DST_ALPHA : GL_ONE blending function.

Additional Lighting Notes
There are several important notes that I should bring up with regard to this lighting technique. The first is to note that both the diffuse and specular lighting terms are purely additive with regard to how they are applied to the color buffer. This means you can safely apply multiple overlapping lights without worrying about errors. The light will just add together (saturating to white if bright enough). This also means that you can apply diffuse and specular lighting independently of the other, in either order, or even one without the other. If you are rendering 10 lights, you can even apply the diffuse components of all 10 lights first, then go back and reapply the specular terms of all 10 lights (however, this would most likely be inefficient for several reasons, but it is possible if you find some need to do this).

Another important note is that in the included demo application, all texturing is done without any mip-mapping. Without mip-mapping you may notice "sparkles" as surfaces move off into the distance. However, you need to be aware that standard mip-map generation techniques will cause problems when applied to normal maps. However, in "A Practical and Robust Bump-mapping Technique for Today’s GPUs", Mark Kilgard discusses a technique for properly mip-mapping normal maps. If you would like to implement mip-mapping of your normal maps then you should definitely read this paper.

The final thing you should note concerns performance on current hardware.This is a generic lighting technique that covers a lot of functionality. However, in doing so we have also added a lot of passes. The light now requires 6 passes for a specular, diffuse, bump mapped, gloss mapped, distance attenuated, color filtered light. As a result, the frame rate can drop tremendously when all features are applied, especially with multiple lights. However, by removing the features that aren't needed, you may be able to collapse this into fewer passes. You should also remember that future chips will be faster and will likely have more texture units, and that one of these future chips will power the upcoming Microsoft X-box. While information on the X-box is very limited right now, it will most likely support volumetric textures and might support 3 or 4 texture units. If so, depending on how the register combiners are set up, it may be possible to compress all 6 passes into only 2 or 3 passes. The point is that, like all other hi end features, performance will become more reasonable as newer processors are made available.

Shadowing
Having such detailed and realistic lighting, it would be nice if we could also incorporate realistic shadowing too. There are multiple ways to incorporate shadows into a scene. Two of the most common techniques are to use shadow volumes and depth maps. While both lighting techniques are completely separate from the lighting developed here in that they don't require and knowledge of or special integration with each other, I thought I would present these techniques together in the interest of providing a highly complete, realistic lighting demonstration.

The one thing that both of these techniques have in common is that they both rely upon the stencil buffer to reject pixels that are shadowed as we render scene geometry. This works well because the lighting technique developed here makes no modifications to the stencil buffer or stencil test settings. Therefore we can just set up the stencil buffer to reject shadowed pixels, then render the lights as normal with no consideration that we are simultaneously adding shadows.

Shadow Volumes
One of the most popular techniques for adding real time shadows is through the use of shadow volumes. With shadow volumes, you determine every polygon in the scene that can cast a shadow, then project the polygon away from the light over an infinite distance, and anything that lies within the shadow volume of a polygon is shadowed by that polygon. By rendering these shadow volumes to the stencil buffer, you can configure the stencil buffer to reject pixels that lie in the shadow volume, thus these pixels don't get lit. For a more complete description of how to use shadow volumes, see "Improving Shadows and Reflections via the Stencil Buffer" by Mark Kilgard, which is also available on the nVidia developers web site.

The main advantage of using shadow volumes is that it has pixel resolution. No matter how big a shadow needs to be or how far away a light source is, the shadow volume always maintains pixel level accuracy. There are several down sides to dealing with shadow volumes. The first is the difficulty of determining whether or not that camera lies within the shadow volume. Special steps need to be taken for each shadow volume where this is true. Another problem with shadow volumes is that, when they lie, either entirely or partially, within the view frustum between the eye and the near clipping plane, you can get strange artifacts that are very noticeable. In the following images, notice how the lighting is incorrect when the camera gets very close to the shadow volume. In some areas the shadowed and unshadowed areas appear reversed.

good_shadowvolumes.jpg (139142 bytes)
correct shadow

bad_shadowvolumes.jpg (110597 bytes)
incorrect shadows when shadow volume intersect frustum near clipping plane

Depth Mapped Shadows
Another technique that is gradually becoming more popular is the use of shadow depth maps. With depth mapped shadows, we create a depth map of the scene from the light's point of view. Then as we render the scene from the camera's point of view, we determine the pixel's depth from the light and compare this value to the corresponding pixel in the depth buffer. If the depth buffer value is closer than the pixel we are rendering, the pixel should be shadowed, otherwise it is lit. For a more complete discussion of depth mapping, read the paper "Shadow Mapping with Today’s OpenGL Hardware" by Mark Kilgard.

Applying Depth Mapped Shadows to Point Lights
While the normal depth mapped shadow technique is quite useful, it has one inherent limitation in applying it to point lights and other 360° lights. The problem is that the depth mapping is designed to work within the confines of a frustum. In order to shadow a point light you need to combine six 90° frustums and project 6 depth maps out over the scene. Examining this in detail, we can see we need to render 6 textures to build the depth maps, and then we need to project all 6 of these depth maps out over the scene. This means we would need 12 rendering passes per point light. Now, by careful scene management, we may be able to determine that if no objects move within one or more of the frustums, and the light itself does not move either, then those frustums can be preserved from frame to frame. In the ideal case (where nothing in the scene moves) we can eliminate all 6 of the passes required to build the depth map. Additionally, if we accepting some inaccuracy then we can decide to only rebuild the cube map every other frame (or even less often).   However, projecting the depth map still requires 6 passes per frame per light. In addition, these 6 passes are only for 8 bit precision. Rendering the depth map in 16 bit resolution requires 3 rendering passes per frustum, for a total of 18 passes per frame per light. This is definitely not desirable.

Since we have already determined that we can reduce the number of passes required to build the depth map by careful scene management or by introducing some inaccuracy, we should then conclude that our main focus for improving performance should be in reducing the number of passes needed to project the depth map. The first intuition for doing so might be to take the 6 depth maps, throw them into a cube map, and then use a single rendering pass with the cubemap to create the shadows. However,the problem with this is that the depth map is generated from the depth buffer, and unfortunately the depth buffer uses a camera space z distance for the depth value rather than the Cartesian distance. Therefore, each of the 6 sides of the cube effectively use a different formula for calculating depth from the (x,y,z) position. What we really need is to generate the 6 depth maps using the Cartesian distance from the camera. Then we would be able to throw the 6 depth maps into a cube map and render the shadows in a single pass.

So how do we calculate Cartesian distance from the light? Recall that in determining distance attenuation for our light source, we had to calculate the distance from the light and subtract that from one. We can use the same technique to generate the 6 depth map cube faces, and then project the depth cube map out over the entire scene in a single pass. By using the tangent space vectors to calculate distance in tangent space (as discussed earlier in the section Calculating Distance Attenuation), we can build the depth map using only 1 texture (the 2D radial map) and we can render the shadows using only 2 textures (the 2D radial map and the depth cube map). Assuming we can adequately reduce the number of depth cube map faces that need to be regenerated each frame, we can reduce this to a reasonable number of passes.

Limitations of Depth Cube Mapped Shadows
As with everything else, there are always tradeoffs in selecting one technique over another. The same is true with Depth Cube Maps. The first problem, which is also a problem with regular depth mapped shadows, is one of limited resolution of the depth map. While shadow volumes always generate pixel precise shadows, depth mapped shadows are only as precise as the resolution at which the depth map is created. Additionally, even with extremely large depth maps, we can still get very noticeable precision problems as the light source gets farther away. The images below show a lower precision depth cube map up close, and a higher precision depth cube map up close and farther away. You can see that the low resolution is a problem, and that increasing the resolution solves the problem, but then increasing the distance of the light source make the problem appear again.

depth_64_close.jpg (165866 bytes)
close up 64x64 depth cube map - notice shadow errors

depth_256_close.jpg (164594 bytes)
close up 256x256 depth cube map - shadow errors are mostly gone

depth_256_far.jpg (121543 bytes)
far away 256x256 depth cube map - shadow errors reappear

Another problem to note is that currently this technique is limited to 8 bits of precision in the depth map. There are several reasons for this which are a bit difficult to explain. One of them has to do with the fact that we require a radial map to generate the Cartesian distance. A radial map does not lend itself to implicit repeating (such as using the GL_REPEAT wrap mode) as a 1D attenuation map does. In order to achieve this, we would need to explicitly encode the repetition into the texture map. However, this is not reasonable because if we use GL_LINEAR sampling we will get shadowing error near where the lower 8 bits rollover, and if we use GL_NEAREST sampling, then we could only encode 10 bits of precision even in a 2048x2048 radial map. I suspect that given 4 texture units (or likely even as few as 3) it would be possible to generate 16 bits of precision in the depth map. However, being that no hardware exists that supports 4 textures, and I don't have access to an ATI Radeon (which supports 3 textures), I cannot confirm for sure that this will work in an acceptable manner. However this is a possibility to keep in mind for the future, as some of the upcoming GPUs will very likely support 3 or 4 texture units. When such hardware is available to me, I will be sure to test this out.

Additional Depth Cube Mapped Shadowing Notes
One thing to note about building the cube mapped shadows is that the textures are rendered to the color buffer and then copied into the texture surface. Because of this, all necessary depth cube maps must be built before any rendering that the user will see in the final scene is performed.

Also, as stated above, there is an 8 bit precision limitation to the depth cube mapped shadows in the demo program. Therefore you will notice that as the light radius increase, so will shadowing errors. Also, in the demo, none of the depth cube map building optimizations discussed above are implemented, so all 6 faces of the depth cube map are regenerated each frame for each active light. Because of this, you will likely notice a very large performance hit when enabling depth mapped shadows, especially for more than one light. The important thing to remember is that the included demo keeps optimizations to a minimum to try and keep code clean, so it is up to you to implement many of the necessary optimizations.

Complete Diffuse, Specular, and Shadowed Rendering
Now that we have covered all of the necessary topics, I present a few final images showing some complexly lighted and shadowed scenes. These images incorporate all of the things discussed so far, and include 2 and 3 active lights. Notice how the same object can creates multiple shadows in different directions, and how realistic the shadows appear where they overlap. Also notice how things like specular highlights from multiple light sources combine together.

final1.jpg (110688 bytes)
A white point light and a white spot light

final2.jpg (65328 bytes)
A red point light, green spot light, and a disco light

Conclusion
Again, we see that per pixel lighting has much to offer in the area of realistic 3D rendering. Combined with other complimentary techniques, it is the path to stunningly realistic graphics. Finally, as I said last time, I would once again like to throw out a big thumbs up to nVidia for being the company to continually push the envelope when it comes to incorporating the most advanced technology and features into consumer graphics cards.

References

  1. A Practical and Robust Bump-mapping Technique for Today’s GPUs, Mark Kilgard, 2000
  2. Improving Shadows and Reflections via the Stencil Buffer, Mark Kilgard, 1999
  3. Shadow Mapping with Today’s OpenGL Hardware, Mark Kilgard, 2000
  4. Per-Pixel Lighting,  Sim Dietrich,  2000
  5. Texture Compositing With Register Combiners,  John Spitzer,  2000
  6. Cube Maps,  Sim Dietrich,  2000
  7. Computations for Hardware Lighting and Shading,  Mark J. Kilgard,  2000

This document, along with any updates, can be found on the web at http://people.mw.mediaone.net/ldkronos/research/advanced_per_pixel_lighting.html