![]() |
![]() |
SHPRTPixel Sample
Path
Why is this sample interesting? Precomputed radiance transfer (PRT) using low-order spherical harmonic (SH) basis functions has a number of advantages over typical diffuse (N dot L) lighting. Area light sources and global effects such as inter-reflections, soft shadows, self shadowing, and subsurface scattering can be rendered in real time after a precomputed light transport simulation. Principal component analysis (PCA) allows the results of per-texel simulator to be compressed so the shader does not need as many constants or per texel data.
Overview of PRT
How does the sample work?
How does this sample differ from SHPRTVertex sample?
Step 1: Offline Fortunately most of the simulator input parameters do not affect how the results are used. In particular one parameter that does affect how to use the results is the "order of SH approximation" parameter. This controls what order of SH basis functions are used to approximate transferred radiance. The explanation of the math behind spherical harmonics is rather involved, but there are a number of useful resources available on the Internet. For example, "Precomputed Radiance Transfer for Real-Time Rendering in Dynamic, Low-Frequency Lighting Environments" by Peter-Pike Sloan, Jan Kautz, and John Snyder, SIGGRAPH 2002 is a good explanation of PRT and for a more graphics developer friendly introduction to spherical harmonics see "Spherical Harmonic Lighting: The Gritty Details" by Robin Green, GDC 2003. In addition to the order parameter, the spectral parameter also affects the results. If spectral is on, that means there will be 3 color channels - red, green, and blue. However sometimes it's useful to work just with one channel (shadows for example). Note that with non-spectral you simply use the red channel when calling the D3DX SH functions as the other channels are optional. The simulator runs for some period of time (minutes) depending on the complexity of the meshes, the number of rays, and other settings. The output is an ID3DXBuffer which contains an internal header and an array of floats for each texel of the mesh. The floats for each texel are called radiance transfer vectors. These transfer vectors can be used by a shader to transform source radiance into exit radiance. However, since there are order^2 number of transfer coefficients per channel, that means that with spectral and order 6 there would be 3*36 or 108 scalars per texel. Fortunately, you can compress this using an algorithm called PCA. The number of coefficients per texel will be reduced to the number of PCA vectors, and this number does not need to be large for good results. For example, 4 or 8 usually yields good results. So, for example, with 8 PCA vectors and order 6 then instead of 108 coefficients per texel we will only need 8 coefficients per texel. The number of PCA vectors must be less than the order^2. For more detail about the math behind PCA, see "Clustered Principal Components for Precomputed Radiance Transfer" by Peter-Pike Sloan, Jesse Hall, John Hart, and John Snyder, SIGGRAPH 2003. Note that with per-texel PRT, clustering does not apply so the number of clusters is simply 1.
Step 2: Real time
![]() Note: to see how this equation is derived from a generic rendering equation, see the DirectX documentation. where:
For the real time step, the sample simply collects all of the data needed for this equation and passes the appropriate data to a pixel shader that implements this equation.
How to implement the equation First the sample reads the simulator's SH PRT results from a file, and puts this data back into an ID3DXBuffer. Then in CMyD3DApplication::CompressData(), it calls D3DXSHPRTCompress() to apply CPCA using some number of PCA vectors and some number of clusters. The output is an ID3DXBuffer (called pPCABuffer) that contains the data needed for CPCA formula above. Note that this sample loads and saves an uncompressed SH PRT buffer, however it would be more efficient to compress then save the textures to it does not have to be done upon init. The sample doesn't do this since it allows the developer to select the number of PCA vectors and texture format without running the simulator again. Next the sample calls D3DXSHPRTCompNormalizeData() on the compressed buffer. This changes the PCA weights so that they are always between 0 and 1 which is imprtant for non-floating point texture formats. It also modifies the PCA mean and basis vectors to compensate. The application requires no changes, this just guarantees maximum precision when using signed normalized texture formats. Then the sample creates (m_dwNumPCAVectors/4) number of textures and places 4 PCA weights in each texture with by calling D3DXSHPRTCompExtractTexture(). It also sets up the texture sampler to trilinear filtering. Note that filtering the PCA weights is works well and is even mathematically correct to do. As the equation shows, to calculate the exit radiance in the shader you'll need not only per texel compressed transfer vectors but also your lighting environment (also called source radiance) approximated using SH basis functions. D3DX provides a number of functions to help make this step easy:
The last piece of data the sample needs from pCPCABuffer is the cluster mean (M), and the PCA basis vectors (B). The sample stores this data in a large array of floats so that when the lights change it can reevaluate the lights and perform the M dot L and B dot L calculations. To do this it simple calls D3DXSHPRTCompExtractBasis() which extracts the basis a cluster at a time. Each cluster's basis is comprised of a mean and PCA basis vectors. So the size of an array, m_aClusterBases, needed to store all of the cluster bases is:
NumClusters * (NumPCAVectors+1) * (order^2 * NumChannels)
Note that the "+1" is to store the cluster mean. Also note that since both (Mi dot L') and (Bkj dot L') are constant, the sample calculates these values on the CPU and passes them as constants into the pixel shader, and since wpj changes for each texel the sample store this per texel data in textures. Finally, CMyD3DApplication::CompressData() calls another helper function CMyD3DApplication::EvalLightsAndSetConstants() which evaluates the lights as described above using D3DXSHEvalDirectionalLight() and D3DXSHAdd() and calls CMyD3DApplication::SetShaderConstants(). This function uses the m_aClusterBases array and the source radiance to perform the M dot L' and B dot L' calculations as described in the above equation and stores the result into another smaller float array, m_fClusteredPCA, of size:
NumClusters * (4 + MaxNumChannels * NumPCAVectors)
This array is passed directly to the pixel shader with ID3DXEffect::SetFloatArray(). Note that the pixel shader uses float4 since each register can hold 4 floats, so on the pixel shader side the array is of size:
NumClusters * (1 + MaxNumChannels * (NumPCAVectors / 4) )
Since we restrict MaxNumPCAVectors to a multiple of 4, this results in an integer. Also note that evaluating the lights, calculating and setting the constant table is fast enough to be done per frame, but for optimization purposes the sample only calls this when the lights are moved. Now that the sample has extracted all the data it needs, the sample can render the scene using SH PRT with PCA. The render loop uses the SHPRTPixel.fx technique called "PrecomputedSHLighting" and renders the scene. This technique uses a pixel shader called "SHPRTDiffusePS" which implements the exit radiance formula above.
What are the limitations of per-texel PRT? There are limitations of PRT since the transfer vectors are precomputed. The relative spatial relationship of the precomputed scene can not change. In other words, a mesh can be rotated, translated, or scaled since those rigid operations do not change the transfer vectors, but if the mesh is deformed or skinned then the results will be inaccurate. The same logic also applies for scenes. For example, if you pass a scene of 3 meshes to the simulator the real time engine could rotation, translate, scale them all as one, but could not rotate a single mesh independent of the others without getting inaccurate results. Since this sample operates on a texel level it is not dependent on the mesh being tessellated, however it does require a unique texture parameterization which means that every point on the surface must map to unique point on the texture. In other words there can be no overlap in the texture space. As a side note, since this technique uses low order spherical harmonics the lighting environment is assumed to be low frequency.
Also note that if you mix meshes that have subsurface scattering with ones that do not then you will likely need to scale the transfer coefficients for the subsurface scattered mesh since they are around 3x darker. With a single mesh you can simply scale the projected light coefficients. You can scale that transfer coefficients by using D3DXSHPRTGetRawDataPointer() and scaling before compressing the data.
|