As you read in the previous blog post, performance of Java and .Net can be fast (or even faster) than C++ compiled code. Anyway, this was a reason for me to look the performance discrepancy and to implement three "remaining" optimizations that made the gap of performance really big:
- escape analysis: allocations for objects that are not escaping are made on stack
- common subexpression elimination
- loop invariant code motion (LICM)
So what it means in performance terms? I will not put exact numbers, but with the best C++ time, you will get a bit less than 1300 ms (on Linux), and around 1400 ms on Windows (on both VC++ and MinGW).
Why these optimizations were so important:
- escape analysis removes for cases of parameters of functions necessity of increment/decrement reference counting
- having a lot of small items (in expressions) that do repeat, they will be precomputed once. This part also work over function calls (if the functions are evaluated as pure).
So if you have a rotation matrix, and you compute against cosine (alpha) and sine(alpha), you don't have to cache the sine and cosine, the compiler will do it for you automatically.
- LICM (Wikipedia article) will work as common subexpressions, but in case your code has expressions that do not change over the loops, they will be executed at the start of the loop once, and not at every iteration. This optimization works also with pure functions, so if you make a function call, this function call will be moved outside of the loop also.
This also means that I will not work (excluding
there are bugs) for optimizations for some time, but you may try to
generate code and the result "should scream".