How I built the diffuse component lookup table for Kajiya-Kay model (Updated!)

The environment lighting was approximated by a series of SRBF lights. I built the diffuse component lookup table, and then rendered the model shaded by one SRBF light through texture fetching in the shader. And compared with the result of direct computation using Kajiya-kay model to verify the lookup table.

Following the description in the paper Interactive Hair Rendering Under Environment Lighting, the bidirectional scattering function for Kajiya-Kay model is

S(ωi, ωo)=Kd + Kscosp(θi+θo)/cosθi

The diffuse component of the convolution integral

is independent of ωo and the azimuth angle φ of the SRBF center ξ, and thus can be precomputed as a 2D table Id(cosθξ, 1/λ).

Because this integral integrates over the unit sphere, in order to compute it, I first generated a lot of sample locations on the unit sphere. But before I had done that, I redefined the coordinate system for the coordinate system used in the original paper is a little misleading.

The coordinate system used in the original paper is

Personally, it is better to assume that the u axis corresponds to the z axis in the regular spherical coordinate system we all familiar with, and w corresponds to y, v corresponds to x.

Here is the regular spherical coordinate system. φ ranges from 0 to 2π. θ ranges from -π/2 to π/2 and is positive in the upper hemisphere (θ is the angle between OP and its projection onto the X-Y plane).

The following code were used to generate sample locations over the unit sphere

const int montSampleSqrtNum = 50;//the total number of samples is 50*50

class MontSample{


float x, y, z;

float theta, phi;


MontSample montSamples[montSampleSqrtNum * montSampleSqrtNum];

//local coordinate samples

void initMontSamples(void){


for(int i=0; i<montSampleSqrtNum; ++i)

for(int j=0; j<montSampleSqrtNum; ++j)


float x = (i+rnd(1.0))/montSampleSqrtNum;

float y = (j+rnd(1.0))/montSampleSqrtNum;

float theta = 2.0 * acos(sqrt(1.0 – x)) – PI/2;//acos(sqrt(1.0-x)) returned a value in the region [0, π/2], after multiplying by 2.0 and then subtracting π/2, the result lay in [-π/2, π/2]

float phi = 2.0 * PI * y;

montSamples[j+i*montSampleSqrtNum].theta = theta;

montSamples[j+i*montSampleSqrtNum].phi = phi;

montSamples[j+i*montSampleSqrtNum].x = cos(theta) * cos(phi);

montSamples[j+i*montSampleSqrtNum].y = cos(theta) * sin(phi);

montSamples[j+i*montSampleSqrtNum].z = sin(theta);



Here comes the code to build the lookup table.

void i_dBuild(void){

float phi = 0;//the diffuse component is independent of the azimuth angle φ of the SRBF center, so any value can be used here.

for(int h=0; h<lambdaRes; ++h)//resolution for the bandwidth λ is 64

for(int w=0; w<cosRes; ++w)//resolution for cosθξ  is 32.


float integral = 0;

float cosTheXi = -1.0f + (w*2.0f)/cosRes;// w÷cosRes only results in [0, 1]. Here it has been adjusted to fit into [-1, 1], the range for cosine.

float theXi = acos(cosTheXi) – PI/2;//acos() returns [0, π], subtract π/2 to fit into [-π/2, π/2].

float lambda = 64.0 * lambdaRes / (3.0*h + lambdaRes);

for(int i=0; i<montSampleSqrtNum*montSampleSqrtNum; ++i)


float xi[3] = {cos(theXi)*cos(phi), cos(theXi)*sin(phi), sin(theXi)};

integral += cos(montSamples[i].theta) * exp(-lambda)*exp(lambda*(xi[0]*montSamples[i].x + xi[1]*montSamples[i].y +



i_d[w+h*cosRes] = integral;



The original paper used 1/λ  as a lookup argument. Because λ lies in [16, 64], so 1/λ  lies in [1/64, 1/16], thus the following equation can map 1/λ  to [0, 1], which is the required range for a lookup argument:

(1/λ -1/64)*64/3

Therefore, in the above code, λ can be solved for 64.0 * lambdaRes / (3.0*h + lambdaRes);

In the fragment shader, the code used to fetch the texture value is

vec2 uvId;

uvId.y = (1.0/lambda – 1/64.0)*64.0/3.0;

if(dot(l, t)>0)//t is the normalized tangent, l is the light direction.
    uvId.x = sin(acos(dot(l, t)))/2.0 + 0.5;
    uvId.x = -sin(acos(dot(l, t)))/2.0 + 0.5;

float diffuse = texture(i_dTex, uvId).r;

Run the program with only the diffuse component activated, I got the following result

The reference result I got from directly computing using Kajiya-Kay model

The first image shows false location of the low intensity area.

And the code for the diffuse component of Kajiya-Kay model:

diffuse = diffCoeff * sqrt(1 – dot(t, l) * dot(t, l));

BTW: I did not see the gain of performance by using precomputed tables in this case.


The original paper defines θ to be in the interval [-π/2, π/2], which is monotonic for sine but not cosine, therefore, if use cosθ to index into the lookup table as well as generate the lookup table ahead, the information of whether the angle is positive or negative will be lost.

So simply replace with the following two lines in the corresponding place in i_dBuild()

float sinTheXi = -1.0f + (w*2.0f)/cosRes;
float theXi = asin(sinTheXi);

and replace with

uvId.x = dot(l, t)/2.0 + 0.5;

in kajiyaShader.fp.

The rendering result will be correct this time.

Implementation of the Dual Scattering Model (a mistake here)

Start from this one, I will add a series of posts regarding the dual scattering model along with my implementation progress.

The first issue, how to compute Equation 6.

Because \bar{a_f} is an integral, so it would be better to compute it in the application and then send it to the shader. At this moment, all I can think out is to declare it as an array with each entry holding the value for a certain inclination \theta_d, and then later in the fragment shader, use the inclination of direct illumination at that fragment to index into the array.

But how to compute the value at each entry?

Notice that the integrand contains f_s, which is the single scattering component and are fed with three parameters, \theta_d, \Phi and \omega. Recall my own implementation of Marschner’s model, the intensity are decided by \Phi, \theta_h and \theta_d, no \omega there. But judge from the author’s description of this equation, \omega should be the outgoing direction, that is, the eye direction in Marschner’s model.

In addition, since it is a double integral, two steps of numerical integration are needed to compute it. First, generate random distribution for \omega, which spreads on the front hemisphere. Then for each \omega, compute the inner integral.

A big notice here:
\theta_d in Equation 6 is not the difference angle, but the longitudinal inclination of the incoming light, which is denoted as \theta_i in the original Marschner’s paper. This has been clearly pointed out in the paper Efficient Implementation of the Dual Scattering Model in RenderMan, but not mentioned in Zinke et al’s dual scattering paper!

Emulate Hardware Shadow Mapping (2)

Continue with my previous post. The key to generate correct result is to set the texture parameter GL_TEXTURE_MAG_FILTER to GL_NEAREST when trying to emulate hardware PCF by one’s own. The code used in my shader is showing here:

vec4 newTexPos = texPos / texPos.w;

vec2 texmapscale = vec2(1/512.0, 1/512.0);

vec2 shadowMapCoord = vec2(512, 512) * newTexPos.xy;
vec2 lerps = fract(shadowMapCoord);

float result[4];

float bias = -0.0;

result[0] = texture(depthTex, newTexPos.xy).r + bias > newTexPos.z + bias? 1.0 : 0.0;
result[1] = texture(depthTex, newTexPos.xy + vec2(1.0/512, 0)).r + bias > newTexPos.z ? 1.0 : 0.0;
result[2] = texture(depthTex, newTexPos.xy + vec2(0, 1.0/512)).r + bias > newTexPos.z ? 1.0 : 0.0;
result[3] = texture(depthTex, newTexPos.xy + vec2(1.0/512, 1.0/512)).r + bias > newTexPos.z ? 1.0 : 0.0;

float shadowCoeffs = mix(mix(result[0], result[1], lerps.x),
mix(result[2], result[3], lerps.x),

The above code does a bi-linear interpolation between the four samples, therefore, there is no need to turn on GL_LINEAR for GL_TEXTURE_MAG_FILTER. Also, the result here:

Figure 7

You may refer to the two threads, thread on OpenGL forum and thread on MSDN forum for further details.

Emulate Hardware Shadow Mapping

I started with the fundamental shadow mapping technique. The shadow mapping effect depends on how you defines the texture object in the application and how you carry out shadow comparison in the fragment shader.

The easiest shadow mapping technique requires to define the texture object with the following code:

glGenTextures(1, &depthTexture);
glBindTexture(GL_TEXTURE_2D, depthTexture);



Pay special attention to the last two texture parameters, GL_TEXTURE_COMPARE_MODE and GL_TEXTURE_COMPARE_FUNC, which only work for depth texture.

In the fragment shader, the texture uniform is defined as

uniform sampler2DShadow depthTex;

then use the following function to directly return a value that represents how much the current fragment is in shadow:

float shadeFactor = textureProj(depthTex, texPos);

Refer to Figure 1 for the result.

Figure 1

However, notice that the value of the third parameter for the texture object, GL_TEXTURE_MAG_FILTER, has been assigned to GL_LINEAR, so the hardware automatically did a 4-sample percentage-closer filtering(PCF) for us, which smoothed out the transition from unshadowed area to shadowed area. And Figure 2 shows the result after I changed the value for the parameter to GL_NEAREST.

Figure 2

Now set back GL_TEXTURE_MAG_FILTER to GL_LINEAR before go on to try more samples.

The change from last program only lies in the fragment shader. The following function can be used to do a projected texture map read with an offset given in texel units. The variable texmapscale is a vec2 containing 1/width and 1/height of the shadow map.

float offset_lookup(sampler2DShadow map, vec4 loc, vec2 offset)
vec2 texmapscale = vec2(1/512.0, 1/512.0);
return textureProj(map, vec4(loc.xy + offset * texmapscale * loc.w, loc.z, loc.w));

We can implement the 16-sample version in a fragment program as follows:

float sum = 0;
float x, y;

for (y = -1.5; y <= 1.5; y += 1.0)
for (x = -1.5; x <= 1.5; x += 1.0)
sum += offset_lookup(depthTex, texPos, vec2(x, y));

float shadeFactor = sum / 16.0;

The result is shown in Figure 3. For detail explanation of the code above, refer to

Figure 3

Because I want to apply PCF to the opacity maps that cannot be treated as depth textures, thus I have to emulate the hardware PCF.

The first thing I need to do, is to change the texture object definition. Change the value for the parameter GL_TEXTURE_COMPARE_MODE to GL_NONE, so the textureProj function will not do a comparison, and the expected return value from textureProj should be the depth value in the sampled texture. In addition, I manually did a comparison between the current depth value and the depth value in the texture as follows:

if(texCoord.z < depth)
return 1.0;
return 0;

But this time, the result was not as expected. The area within the view of the light was all black, as if all the comparisons returned 0 (Figure 4).

Figure 4

Therefore I tried another way. Changed the definition of the texture uniform to

uniform sampler2D depthTex;

thus the function textureProj no longer takes in a sampler2DShadow, and returns a vec4, which should be the depth value in the texture. I took the value from the R channel and compared with the depth of the fragment, but the result was also out of expected (Figure 5).

Figure 5

I will stop here until I find better solutions. I also found some strange phenomenon that might be helpful in solving my current problem.

Back to the 16-sample PCF, keep everything unchanged except the GL_TEXTURE_MAG_FILTER parameter, here I assign GL_NEAREST, and the result becomes a little like Figure 5 (Figure 6).

Figure 6

Analysis of Marschner’s model

Here is my understand regarding the azimuthal component of Marschner’s model. All the analysis are carried on a cross section of the hair.

I posted two figures for later reference. Figure 1 shows the scattering geometry, and Figure 2 the cross section.

Figure 1: Scattering geometry

Figure 2: Geometry for scattering from a circular cross section

The goal of Marschner’s model is to solve for the outgoing intensity. Please refer to Figure 2 first, the incoming ray has been scattered into three ways, some part of the energy has been reflected directly; some enters the hair, and leaves at the other side of the surface; some enters the hair, reflected at the inner surface and final refracted out. Likewise, when given the outgoing angle phi(the relative azimuth phi_r – phi_i), we can reverse this process to find out the directions of light that contribute to the outgoing intensity. For instance, in Figure 3, the intensity of the outgoing direction (in blue) is the contribution from three (not necessary to be exactly three)incoming beams (in  red).

Figure 3: Light from three paths that contribute to one outgoing direction

Back to the second figure, setting aside attenuation for the moment, power from a small interval dh in the incident beam is scattered into an angular interval d/phi in the exitant intensity distribution


To complete this equation, we have to take into account attenuation caused by absorption and reflection by introduce an attenuation factor A(p, h) in front of the intensity contributed by a path.

where p = 0 stands for surface reflection R, 1 for refractive transmission TT, and 2 for internal reflection TRT. Refer to Figure 3 for a visualization of the three paths.

Consider Path 1 in Figure 3, the light reflects directly, so the Fresnel factor can be applied to account for the reflected energy, thus A(0, h) = F(eita, gamma_i);

For Path 2, the light first undergoes refraction, thus the energy enters the hair should be (1-F(eita, gamma_i)) multiple the original energy. Then it travels a distance of 2*cos(gamma_t) (with the radius of the hair being unit length), assume sigma_a to be the volumn absorption per unit length, the absorption factor T should be exp(…), finally, it refracts out of hair, so (1-F(eita, gamma_t)) is used to account for the refracted energy.

Path 3 is similar to Path 2, except that the light reflected once at the inner surface of hair, and travels two times of the distance as it in Path 2. Personally, I would write A(2, h) as

rather than

given by the paper, where p = 2.

One more thing to mention. Before we carry on analysis, we have to project the scattering geometry from 3D to 2D, as well as various parameters, including the index of fraction, and each internal segment should be lengthened by a factor of 1/cos(theta_t). 

 The End

Car racing demo using Irrlicht

This game is developed for use as a demo in my presentation on Irrlicht engine at the Technique Introduction Seminar within our laboratory.

Have a look at first.

The city model used in this demo comes from the website

in the format of 3DS, and the two car models were borrowed from our lab, loaded as Obj file in the demo. Actually, I prefer Obj files than 3DS as I can use a text editor to read the content of it.

I use Irrlicht as the 3D engine. Why chose it? It’s simple to install and easy to use. I also studied Ogre3D three weeks in total, and found it easy to take use of the ready-made functions, but extremely difficult when come to a need to extend the basics.

Help information that should be included in this demo:

Press W/S for forward/backward, A/D for strafing left/right, Control for speed up.

The game contains three scenes, the countdown scene from three to one, then shows go, the main racing process and the finishing scene where the camera is placed near the finishing line and points to the car that arrives earlier.

The development process of the racing scene:

Step 1. Place the model

The loaded models did not show at the proper places, so I had to manually change the positions of them. I placed the two cars side by side on a bridge in the city model. Here I want to talk about how I achieved that. I changed the initial position of one car model little by little until it got to the middle of the road. Every time after I changed the position, I had to compile the program to see where it was until it looked fine to me. I do not know any better ways to do this. All I can think of is to display the position of the camera so by moving the camera to a suitable position for a car, I can read off the coordinates in the world space, and then use this data to indicate the position of the car. If you have better ideas to handle this, please email me.

Step 2. Define the movement of the car.

I followed the convention of most game controls with W forward, S backward, A strafe left, R strafe right. At first, the four kinds of movement are defined in the world space, but soon I found a bug. If the car was placed at the XZ-plane and initially heads to the +x direction and turns an angle, say -90 degrees, it would head to –z direction. Since the movement defined for the key W is relative to the world space, pressing W only moves the car toward +x direction, but when the car faces –z direction, the expected forward movement should toward negative z, that is not what the W key does. In order to solve this, I introduced a direction vector for the car, every time the car turns, the direction vector turns as well, and the W key is defined to move the car along the direction vector, and the S key opposite. In this way, I managed to move the car along the direction it faces with W and S keys.

Step3. Define the movement of the competitor car.

This is the part that I am not satisfied with. Since the track contains a curve that connects two orthogonal straight roads. What I thought is the car should turn a total degree of 90 between it enters the curve and leaves the curve. The curve has actually been divided into four straight segments. I confirmed the coordinates at each connecting point using the same way as in Step 1 and assigned a 90/4 degrees-turn during each segment. It needs tremendous work to adjust the movement of the car to make it looks natural because the difference in the length of each segment requires assigning different degrees of direction change. For simplicity, the 90/4 degrees was assigned to each segment.

I included the binary file and the source file, only one cpp, through the link. You will find it really difficult to read the source file since I am too lazy to structure it.