This project is part of HWRTL. HWRTL is a single file hardware ray tracing library containing GI baking, HLOD texture generation and DXR PVS generation.
Firstly, we pack the lightmaps of each mesh into a number of atlases and search for the best lightmap layout with the smallest total size. The packing algorithm uses a third-party library: stb_pack. Then, generate the lightmap gbuffer by GPU and start light map path tracing. For each texels in the light map gbuffer, we trace 512 rays from the world position recorded in the gbuffer and each tracing bounces 32 times. For each bounce, we perform a light sampling and material sampling and combine them by multi-importance sampling to reduce variance.
Furthermore, we have implemented part of the ray-guiding algorithm. First, we split the light map into clusters and each cluster shares the same luminance distribution, which can amplify the number of learning rays and reduce variance in the learned ray distribution. Then, map the ray sample direction of the first bounce into 2D, store the luminance at the corresponding position in the cluster and build the luminance distribution. After that, we build the luminance CDF in the compute shader by wave instruction. During the latter path tracing pass, we adjust the sample direction and PDF based on the ray guiding CDF.
We perform a simple dilate pass in order to solve the sampling bleed problem caused by bilinear sampling.
Unreal GPULM employs Odin(Open Image Denoise) to denoise the traced lightmap. We have implemented a custom denoising algorithm (Non-local means filtering) to replace the Odin, since HWRTL design goal is a single file hardware raytracing library and we don’t want to involve the thirdparty.
Packing
The diffuse lightmap UV which is usually called 2U is different from the other texture UV coordinates. Multipile meshes may share one lightmap atlas, which means we should pack their GI baking results together.
In HWRTL, the size of the mesh lightmap is specified by the user for simplicity. In practice, it’s better to calculate the lightmap size based on the triangle size in the world and UV space.
We use the exhaunst algorithm to search for the result. First, search the maximum lightmap size among the scene meshes and use this size as the primary search size.
1 2 3 4 5 6 7 8 9 10 11 12 13
std::vector<Vec2i> giMeshLightMapSize; for (uint32_t index = 0; index < pGiBaker->m_giMeshes.size(); index++) { SGIMesh& giMeshDesc = pGiBaker->m_giMeshes[index]; giMeshLightMapSize.push_back(giMeshDesc.m_nLightMapSize); nAtlasSize.x = std::max(nAtlasSize.x, giMeshDesc.m_nLightMapSize.x + 2); nAtlasSize.y = std::max(nAtlasSize.y, giMeshDesc.m_nLightMapSize.y + 2); }
int nextPow2 = NextPow2(nAtlasSize.x); nextPow2 = std::max(nextPow2, NextPow2(nAtlasSize.y));
nAtlasSize = Vec2i(nextPow2, nextPow2);
Then, iterate the light map packing size from the maximum lightmap size to the user-specified maximum atlas size.
For each iteration, try to pack them into a set of lightmap atlas using the open source library stb_rect_pack. If the total lightmap area is smaller than the minimum area computed in the previous iteration, use the current iteration lightmap layout results as the preferred layout.
Record the lightmap offset and scale of each mesh in the lightmap atlas texture.
Generate the lightmap GBuffer for each lightmap atlas, which is GPU friendly for hardware raytracing. The pixel output destination is the texel position calculated from the lightmap UV and scale, which is located in atlas space rather than world space.
Consider the problem of evaluating direct lighting integrals of the form
Lo(p,wo)=∫S2f(p,wo,wi)Ld(p,wi)∣cosθi∣dwi
If we were to perform importance sampling to estimate this integral according to distributions based on either Ld or fr, one of these two will often perform poorly[PBRT-V4].
GPULM uses two sampling distributions to estimate the value of ∫f(x)g(x)d(x) in order to solve this problem. The new Monte Carlo estimator given by MIS is:
Given a sample X at a point where the value pf(x)is relatively low. Assuming that pf(x) is a good match for the shape of f(x), then the value of will also be relatively low. But suppose that g(x) has a relatively high value.
The standard importance sampling estimate pf(x)f(x)g(x)will have a very large value due to pf being small, and we will have high variance.
GPULM uses power heuristic weights with a power of two to reduce variance:
// step2: Sample Material, Generate a [LIGHT DIRECTION] based on the material randomly if(debugSample != 2) { SMaterialSample materialSample = SampleMaterial(rtRaylod,randomSample); } }
Light Importance Sampling
Generate the lighting picking cumulative distribution function first for inversion sampling method[PBRT-V4].
we can draw a sample Xi from a PDF p(x) with the following steps:
1.Integrate the PDF to find the CDF P(x)=∫0xp(x′)dx′. For discrete case, the result is: P(x)=∑i=0np(lighti):
1 2 3 4 5
for(uint index = 0; index < m_nRtSceneLightCount; index++) { vLightPickingCdfPreSum += EstimateLight(index,worldPosition,worldNormal); aLightPickingCdf[index] = vLightPickingCdfPreSum; }
Estimate the light PDF for each light source in the scene:
Store the direct lighting result (bounce == 0) and the indirect lighting result together and accumulate the valid sample count. What’s more, project the lighting luminance into the directionality SH. We will detail this store format in the latter “LightMap Encoding” part.
Note that ray guiding is disabled by default in our sample scene, since it still has some bugs.
Unreal GPULM has implemented a new algorithm called “first bounce ray guiding”, which is similar to [Ray Guiding For COD].
Path Guiding
A brief introduction to ralated algorithm (Path Guiding) referenced from Ray Guiding For COD:
To improve the convergence of MC estimators, there has been a recent surge of work related to path guiding. The key idea is to collect samples of light transport and learn a probability distribution for importance sampling. Unlike analytical importance sampling methods, the guiding process is usually >designed in a way that it can take into account the light modulated visibility term, which is often impractical to tackle analytically in a closed form.
Ray Guiding Introduction
As we mentioned above, the single BRDF importance sampling estimate will have a large value due to p(f) being small. So we add additional light importance sampling with MIS:
The standard importance sampling estimate pf(x)f(x)g(x)will have a very large value due to p(f) being small, and we will have high variance.
However, estimate lighting distribution is difficult. It’s hard to deal with the visibility term unless you trace a visibility ray and compute the intersection with the scene, which is expensive for GI Baker. Furthermore, it’s difficult to deal with secondary light sources.
Ray guiding in COD and Unreal GPULM runs in fixed memory footprint, independent of scene complexity. In addition, they are able to combine information from all the learning samples, leading to a more efficient estimator with reduced variance.
Implementation
We have implemented part of the ray guiding algorithm. It contains four parts: clustering texels, building the guiding distribution, filtering the learned distribution and adjusting the sample PDF and direction.
We accumulate radiance in all directions during the first 128 light map pass tracing. Then, filter and build the ray guiding distribution based on the cluster radiance distribution. After that, adjust the sampling PDF and direction in the latter light map path tracing.
Clustering Texels
Concentric Mapping
At first, we generate the sample direction based on concentric mapping. Since the concentric mapping is an equal-area mapping, the PDF p(x,y) is constant, which allows us to sample uniformly with respect to area. Furthermore, we can perform inverse concentric mapping to map the sample direction into the 2D position in which the radiance result is stored to be used in the construction of the ray guiding distribution.
To reduce variance in the learned ray distribution, we amplify the number of learning rays used to construct the PDF by sharing the PDF between clusters of spatially and geometrically coherent texels. —— ray guiding for cod
In our example, every 8*8 pixels in the lightmap gbuffer share one cluster. The cluster size is 16 * 16, which corresponds to 256 sampling directions. The stored position in the cluster tile is a random sampling position before concentric mapping.
1 2 3 4 5 6 7 8 9 10 11 12 13
int minRenderPassIndex = 0; int maxRenderPassIndex = RAY_GUDING_MAX_SAMPLES;
We store the maximum luminance for each direction using InterlockedMax. For positive floats we can cast them to uint and use atomic max directly.
radiance distribution:
Building & Filtering the Guiding Distribution
Filtering
Even though we share all the texel samples within each cluster to learn the PDF, there is a risk of overfitting to the samples that can lead to slower convergence if the learned PDF does not reflect the population distribution well enough. To mitigate this effect, we apply a relaxation step by running a hierarchical smoothing filter over the quadtree representation of the guide distribution —— ray guiding for cod
Filter the radiance results around the current pixel:
1 2 3 4 5 6 7 8 9 10
float value = 0; int filterKernelSize = 1; for(int dx = -filterKernelSize; dx <= filterKernelSize; dx++) { for(int dy = -filterKernelSize; dy <= filterKernelSize; dy++) { int2 final_pos = clamp(grp_idx.xy * DIRECTIONAL_BINS_ONE_DIM + thread_idx.xy + int2(dx, dy), pixel_min_pos, pixel_max_pos); value += asfloat(rayGuidingLuminanceBuildSrc[final_pos]); } }
Building Ray Guiding CDF
Build the CDF for each cluster. Compute the prefix sum per row using the WavePrefixSum operation. Then, obtain the sum of the row by using WaveReadLaneAt. It should be noticed that the compute group row size MUST be aligned to the warp size. Finally, calculate the normalized result based on the row sum.
The result computed above is ray guiding CDF in x dimension. We will compute the ray guiding CDF in y dimension next. Store the sum value for each row in the shared group during CDF X computation. Then, execute WaveReadLaneAt and WavePrefixSum to compute the CDF in the Y dimension.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
if (thread_idx.y == 0) { if (thread_idx.x < DIRECTIONAL_BINS_ONE_DIM) { float value = RowSum[thread_idx.x]; float prefixSum = WavePrefixSum(value) + value; float sum = WaveReadLaneAt(prefixSum, DIRECTIONAL_BINS_ONE_DIM - 1);
Because of bilinear interpolation and typical non-conservative rasterization, the sampling texels may blend with the background lightmap color. Furthermore, we may sample the texels that don’t belong to the current mesh’s light map since multiple meshes may share one lightmap atlas.
In our example, we solve this problem by expanding the lightmap with some padding board texels. Then, dilate the lightmap path tracing result. There is still a lot of work to be done on this simple implementation.
Denoise
Unreal GPULM employs Odin(Open Image Denoise) to denoise the traced lightmap. We have implemented a custom denoising algorithm to replace the Odin, since HWRTL design goal is a single file hardware raytracing library and we don’t want to involve the thirdparty.
A bilateral filter is a non-linear, edge-preserving, and noise-reducing smoothing filter for images. It replaces the intensity of each pixel with a weighted average of intensity values from nearby pixels. This weight can be based on a Gaussian distribution. Crucially, the weights depend not only on Euclidean distance of pixels, but also on the radiometric differences (e.g., range differences, such as color intensity, depth distance, etc.). This preserves sharp edges ——From Wikipedia
A key issue with the bilateral filter, however, is that the range-distance estimation can be sensitive to noise.
Non-Local Means
The non-local means (NL-Means) filter is a generalization of the bilateral filter where the range distance is computed using small patches P(x)around the pixels (NFOR):
where k is a user parameter controlling the strength of the filter. The squared patch-based range distance is defined as:
∣∣Pp−Pq∣∣2=max(0,∣P∣1∑n∈P0d(p+n,q+n))
where Pp is a square patch of size |P| (usually 7 x 7 pixels) centered on p, and P0 enumerates the offsets to the pixels within a patch.
Joint Filtering
If additional per-pixel information is available, the quality of bilateral or NL-Means filtering can be further improved by factoring the information into the weight-the so-called joint filtering(NFOR):
wjnlm(p,q)=wnlm(p,q)∗waux(p,q)
where waux(p,q) is a weight derived from the additional information. Given a vector of k auxiliary feature buffers, f=(f1,...,fk), we can compute the weight as:
waux(p,q)=Πi=1kexp(−df,i(p,q))
where:
df,i(p,q)=2σi,p2∣∣fi,p−fi,q∣∣2
and σi2 is the bandwidth of feature i. Joint filtering is typical for denoising MC renderings where feature buffers (e.g. normal, albedo, depth) can be obtained inexpensively as a byproduct of rendering the image
Implementation
For the square patch Pp centered on p with the size of 20x20(HALF_SEARCH_WINDOW * 2) in our case, we compute the squared patch-based range distance to another patch Pq.
1.Store the sqrt irradiance result to give more precision in the draw region.
2.Separately store the integer and decimal parts of the log radiance.
3.Store the SH Directionality in a separate lightmap. The reason we store directionality separately instead of multiplying it with irradiance is that many pixels on the screen may share one texel and we can adjust the lighting result based on the world normal, which improves the visual quality.