Optimizations

In the last post I wrote about how rendering with my BSP algorithm was slower than using brute force. The algorithm has since been optimized in two ways:

1. Smarter BSP Tree Traversal

There are now additional cases in which branches are not traversed. For instance, if an intersection is found in the branch that is closer to the camera, the other branch is not traversed. It is indeed a very essential strategy in BSP tree traversal, and implementing it made rendering about twice as fast.

2. Shuffled Triangle List

This has to do with the compilation of the BSP tree, rather than the traversal. Previously, when creating a node in the tree, the node’s splitting plane was based on the first triangle in the given triangle list. This meant that if the input data was somewhat spatially ordered (which I believe it is, when it is exported from Blender), the tree would become very unbalanced.

Shuffling the triangle list makes the tree “randomly balanced”, which in most cases is better than “consistently unbalanced”. A side effect of this is that the amount of work required for rendering is nondeterministic for a given input.

The bar for improving the compiler further is relatively low; it just needs to be better than guessing. If I can figure out a simple way to do it, I will.

Performance Impact

Based on very limited testing, it seems that these two changes have made the BSP render times to be pretty much on par with brute force, on average. Previously, rendering with BSP could take more than three times longer.