- Introduction and Objective
- Environment introduction
- Rendering scene
- Physically Based Rendering
- Image Based Lighting
- Deferred Shading
- Omni shadow map subdivision for many lights
- Result comparison and analyst
Introduction and Objective
This report will introduce my implementation of multiple features related to deferred shading. And according to the proposal, I will do a comparison between forward shading and deferred shading in two perspective: Image quality and FPS (frame per-second). The result will be shown at the end of the report.
Processor: Intel(R) Core(TM) i7-9700K CPU @ 3.60GHz
Installed Ram: 16.0 GB
Display adapters: NVIDIA GeForce RTX 2080
Teapot(18906 vertices) * 8
Container(480 vertices) * 1
Skybox(72 vertices) * 1
Environment capture * 3
All textures are 2K.
1234567890: switch between different mode.(1: forward shading, 2: deferred shading, 3: albedo, 4: metallic, 5: roughness, 6: normal, 7: texture AO, 8: depth, 9: blurry SSAO)
-/+ : Adjust exposure in forward/deferred mode.
C/V: Adjust ambient light intensity in forward/deferred mode.
G: Spawn a point light in front of the camera position.
LMB: rotate horizontally, zoom-in/out vertically
RMB: rotate horizontally and vertically
WASD: Move according to camera
IJKL: Move all point lights together according to the world.
Space/Left Ctrl: Ascent / decent all point lights according to the world.
Right Ctrl: Toggle the omni shadow map display
Physically Based Rendering
My implementation of the PBR is metallic-roughness workflow. The following image shows my material LUA file.
By default, it contains 5 textures: albedo(24 bits), metallic(8 bits), roughness(8 bits), normal(24 bits), AO(8 bits). The remaining variables is helped to adjust intensities of some properties.
Image based Lighting
Image based lighting is used to simulate global illumination and help achieve surface reflection. I use oct-tree structure to accelerate the capture probes collision. Before rendering the first frame, for each probe in the scene, it will capture 2 cube-maps (irradiance map and prefiltered map). They have the scene information inside their own capture range.
Prefiltered map: Left(LOD0 – 256 * 256), Right(LOD1 – 128 * 128)
Roughness of the surface is the basis of which LOD of the prefiltered map should be sampled.
For each frame, importance of all probes is calculated depend on their radius and the distance to the camera position. Top 4 probes pass send their cube-map pair to PBR shader. Shader interpolates them according to their importance such that in different area of the scene, the camera can see different reflection.
In this scene, three probes are placed. There is no popping, but the blending seems not to be very realistic. Adding more probes may help this problem.
My G-Buffer format are like this.
|RT0||Albedo + Metallic||RGBA8|
|RT1||Normal + Roughness||RGBA16|
|RT2||IOR + AO||RGBA16|
Left side: deferred shading with SSAO; Right side: forward shading without SSAO. They looked almost no difference.
To achieve the post-processing effect, I rendered the whole pipeline to a framebuffer storing high dynamic rage color(16 bits). And I render the color to a quad and do post-processing like adjusting exposure and SSAO.
Hight exposure and low exposure.
Omni shadow map subdivision for many lights
Since maximum 32 textures can be processed in a single shader, shadow map texture per light does not fulfill the requirement of many lights. The idea of shadow-map subdivision in CPU and remapping in shader would help solve this problem. I believe it is called shadow map allocation.
In this scene, there are 80 point-lights. Each of them will cast shadow but their shadow map resolution will change according to their attenuation distance and distance to the camera.
Instead of allocating 80 shadow maps, I only allocate 5 shadow maps, and sub-divided them to 16 sub-area (1024 *2 + 512 * 6 + 256 * 8). Less important light will use smaller area of a shadow map. The redness of the circle indicates the importance of its point light.
The left image display 5 omni shadow maps for 80 lights (16 sub area each face for one complete shadow map). The right image is a bigger version of the first shadow map.
Rendering the entire scene 80 times per frame could be the bottleneck of this algorithm.
In order to optimize the rendering speed, I added AABB for each mesh.
In CPU for each light I check if the point light sphere is intersecting with the AABB of the mesh. Only when the mesh passes the intersection test, can the mesh be rendered in the point light shadow map pass.
In this way, unless all lights are all illuminating all meshes in the scene, performance should be improved quite a lot when some lights are far away from complex meshes.
Result comparison and analyst
The following images are the comparison references scene. 80-point lights, 8 teapots.
From the left to the right:
- no shading (no point light is shading any point).
- Half of the point lights are shading all 8 teapots, and some point lights are shading some of the teapot.
- Almost all point lights are shading all 8 teapots.
|Average FPS||Forward Shading||Deferred Shading with SSAO|
|80 Lights without AABB, no shading||100||103|
|80 Lights without AABB, shading half of the teapot||65||77|
|80 Lights without AABB, shading most of the teapot||47||59|
|80 Lights with AABB, no shading||144||144|
|80 Lights with AABB, shading half of the teapot||80||96|
|80 Lights with AABB, shading most of the teapots||54||70|
So deferred shading improves the performance around 23% when there are more lights doing shading jobs. And they have almost the same rendering quality.
Also adding bounding box for determining if render shadow map or not improves the performance around 20% when there are less lights are casting shadow.
- Deferred shading theory and implementation:
- Deferred shading theory (This is a good one, but I cannot open it now)
- Deferred shading usage:
- Deferred lighting theory:
- Dissecting a frame in GTA-V with deferred shading algorithm:
- Real Shading in Unreal Engine 4:
- Shadow map allocation