For this project, I am building my own Entity-Component-Systems Architecture C++ Engine. In its current state, it is a small, powerful rendering engine featuring the following:
- Entity-Component-Systems Architecture
- Deferred Shading
- Point Light Volumes
- Dynamic Directional light shadow map
- Custom dynamic memory allocator/manager (no ‘new’) to optimize cache performance
- Transparency (Sorted back-to-front on CPU)
- Multi-pass rendering
- Post-processing step (currently just gamma correction)
- Material and texturing support
Demo
2. Rendering Passes
Relevant code: RenderFunction
The passes my engine now supports are as follows. I will go into each more deeply below:
- Culling pass
- Shadows pass
- Deferred-to-texture pass
- Deferred lighting pass
- Forward pass
- Post processing pass
2.1 Culling
Relevant code: Function, Structs
In my culling pass, I go through all meshes and light volumes and perform frustum culling on them. Meshes and light volumes that are not culled are chunked into smaller structs of data that more directly related to how they will be used. Transparent and opaque meshes are partitioned into two separate lists so that I can treat them differently in later rendering passes.
The reason that I chunk meshes and point lights into structs is to improve cache performance as well as make a compact data structure that I can send up to the GPU. A cache miss and thus a read from memory can be in the neighborhood of 200 cycles, while something like a dreaded sqrt is in the neighborhood of 35. So if I can make my data more compact and direct, accessing data inside it should provided better cache performance.
2.2 Shadows
Relevant code: Function, VertShader, FragShader
In my shadow pass, I populate a 4096×4096 texture with the depth values of my scene as seen from the sun (single directional light). I create an orthographic projection matrix which should be calculated to fit the scene perfectly, though mine is hard-coded, and a view matrix from some position along the negative sun’s direction. I then re-draw every mesh that hasn’t been culled into a framebuffer that has only a depth texture attachment and grab the depth component.
While it may seem like a lot of extra drawing, the shadow map shader is incredibly simple and thus doesn’t take up much GPU time at all:
#vert
in vec3 inPos;
uniform mat4 lightProjView;
uniform mat4 model;
void main() {
gl_Position = lightProjView * model * vec4(inPos, 1.0);
}
#frag
void main() {
}
As you can see, all I do is project the vertex position of the mesh into light space and then OpenGL automatically saves the depth to the depth texture attachment. This depth map is then used later on to compare with fragments. I can project future fragments into light space and check their depth against that stored in the texture. If the fragment’s depth is greater than what is on the texture, it means that the fragment is occluded and should be shadowed.
2.3 Deferred-to-Texture
Relevant code: Function, VertShader, FragShader
This pass is the stereo-typical deferred rendering pass. I have a collection of textures/attachments in a framebuffer that will be used to represent different aspects of my scene, namely:
- Positions
- Normals
- Diffuse color
- Specular color and exponent
- Depth
I run through all of the opaque geometry and store the related values in these textures. Each texture is the size of the window I am drawing to.
The reason I do this is namely to save GPU computation time in the next pass, the lighting. By drawing to textures, I can make use of the automatic depth attachment and automatically cull non-visible fragments. This results in a texture that shows only the fragments that will make it to the screen and thus the benefit that deferred rendering gives is that instead of the lighting calculations being:
O(numFragments * numLights)
,
they are instead
O(numFragments + meaningfullFragments * numLights)
which allows us to save on lighting by only performing lighting on fragments that actually matter. Lighting is often seen as one of the biggest bottlenecks, so this a neat way to get past it.
2.4 Deferred Lighting
Relevant code: Function, VertPointLightShader, FragPointLightShader, VertDirectionalLightShader, FragDirectionalLightShader
In this pass, I take all the textures that I previously wrote to in the deferred-to-texture pass and perform lighting calculations.
In my engine, I perform lighting using point light volumes and a directional light. These are actually done in two separate sub-passes, so I will go in each separately.
2.4.1 Point Light Volumes:
Doing lighting with point light volumes means that rather than just sending a position/color/radius to the GPU for each point light, I instead draw a sphere at each point light’s position. The reasoning for this is as follows: Because I draw a sphere mesh, no fragments outside the sphere’s radius will be impacted by the light volume. I practically get a BVH of each fragment that cares about the light, for the cost of drawing a rudimentary sphere.
So, for each light volume, I perform Blinn-Phong lighting calculations on all relevant fragments using values from the previous textures (position, normal, etc.). Then each light volume result is additively blended with all other light volume renders, giving us a fully lit scene in the end.
The only really interesting thing here is that due to drawing an actual mesh for the point lights, I have to switch culling back and forth depending on if I am inside of the volume or not. What I cull and the camera’s position determines if I actually see the light from the light volume. Here is some pseudo code to illustrate:
if ( length (cameraPosition - pointLightPosition ) < pointLightRadius)
Cull Front face
else
Cull Back face
2.4.2 Directional Light:
For this sub-pass, I am lighting everything with a directional light and also calculating shadows for all of the opaque geometry I have already drawn.
Again, I use the previous deferred textures. However, this time, instead of a light volume, I send up a lightspace matrix, the direction of the light, and the color. Since the directional light could impact every pixel, rather than drawing geometry, I just draw a full-screen quad. The quad is the same size as our deferred textures, so texture lookup is very simple.
The only lighting calculation difference here is that now I calculate shadows. I didn’t for the point lights as there can be thousands of them and they don’t all need to do shadows. So I determined that only the directional light would have shadows.
I already went over the general calculations for shadows in 2.1 Shadows, but here is some more specific pseudo code:
Get fragment position from our position texture
Calculate lightspace position of fragment using lightspace matrix
Determine projection coordinates and z-depth of our lightspace position
Get depth from shadow texture
if textureDepth < lightspaceDepth
we are in shadow
else
not in shadow
In my shader, I actually do this calculation for the lightspace fragment and its neighbors to try and produce a softer shadow.
2.5 Forward
Relevant code: Function, VertShader, FragShader
This is a stereo-typical forward rendering pass.
The reason I have to do this is actually a trade-off of deferred rendering. Writing only the necessary fragments to textures saves a lot of lighting time. However, it completely breaks semi-transparent objects. Color for transparent objects is based off what is behind/in front of the object. The colors are blended. Deferred rendering only saves one value, so to be able to get that blending, we have to do the transparent objects in a separate pass.
In this pass, I do the following:
Copy over depth buffer from deferred rendering
culls transparent objects that are behind opaque ones
Sort transparent objects back to front
Send up point lights in SSBO
Allows dynamic number of lights which works well with light volumes and
culling
Do Blinn-Phong lighting
PointLights are same as deferred however now every pixel is looked at,
not just those in the point lights mesh
Directional Light differs based on calculated shadow
Result is sum of the two lighting values
2.6 Post Processing
Relevant code: Function, VertShader, FragShader
Finally, my last pass. This is also the least complete one. A lot of things could be done here:
- Blur
- Glow
- Anti-aliasing
- Bloom
The only one that I am currently doing is gamma correction. However, I am excited to try others.
Creating this pass is actually a bit of work. You can’t do post processing effects on the screen, only a texture. So, in previous passes, you have to be rendering to a final texture rather than the screen. This isn’t all that difficult, but it does impact practically all previous shaders. Data has to be copied over from pass to pass in places. But the power you gain to work with your render before it is outputted is awesome.
3. Performance
I have made some attempts to optimize some pieces of my engine and that can be seen below:
- Each fragment uses only 128 bits in my deferred rendering (assuming that OpenGL turns all RGB into RGBA)
- 32 bits (RGB8) for octahedron encoded normals
- 64 bits (RGBA16UI) for diffuse, specular, and specular exponent
- 32 bits (GL_DEPTH_COMPONENT24) for depth
- Relevant Files: Performance.txt
- Manual dynamic memory allocation
- One (or more) buffers that all dynamic memory is put into as to optimize cache performance
- Avoids memory in random locations when using the built-int ‘new’ functionality
- This feature is still under construction
- Relevant Files: MemoryManager.h MemoryManager.cpp
4 Design Choices and Next Steps
4.1 Things I’ve done that I like:
- Work on optimizing bit usage in my deferred textures.
- Huge boost in performance.
- Multiple render passes.
- I feel like I got the big ones and the pipeline between them exists.
- AssetManager
- One place for vast majority of shared resources. Limits copies. Saves memory.
- MemoryManager
- Limiting memory fragmentation and randomness. Boost cache performance.
- Shadow maps
- The idea of having a baked shadow is awesome. Also the realization that not all lights need shadows, just the meaningful ones, was helpful.
4.2 Things I’ve done that I’ll have to change:
- All transparent objects are treated the same
- No check for cut-outs vs semi-transparent objects
- Variables specific to the rendering system are not a part of the asset manager
- Lights, textures, shaders
- Only support one directional light
- Only support one camera
4.3 Things I hope to do:
- Tiled Deferred Rendering
- Handle std::vectors in my manual dynamic memory allocation
- Forward+ handling of transparent objects
- Physics
- Collisions
- Animation with FBX support
- Allow baking process for shadows/lights