Saturday, February 8, 2014

Target 0.0.3: Status 1

One two tree, done!



The compilation step is much faster. Optimizations can work (by uncommenting some lines of code) even by using multiple cores. As for example, the screenshot from the OpenGL application I had presented it earlier was taking like 600-700 ms on an Intel i5-540M. Right now it takes for CR to generate the code around 170-220 ms (the first time is slower, as spinning disks are slower at access). But by all measures is much faster.

Most optimization steps that CR perform do the following: look for a pattern of code that match a property, after that if it matches, it tries to perform the optimization by impacting some instructions, it notifies the compiler that some changes are done. CR after this will try to perform all optimizations up to the point no optimization can be done. CR in a typical case will apply (using the default codebase optimizations) 35+ optimization steps for every function, every instruction, etc.

In typical case let's say there is just one some optimizations that can be done, for example: a variable is nowhere used, so it can be removed. Before noticing that the variable can be removed, the compiler performs other optimizations, and right after will perform another pattern matching. At last when the optimization of the unused variable is match, for every instruction CR will track all declared variables and CR will compare with all used variables. At the end what remains, they can be removed.
But as someone can notice, the step that makes that one optimization succeed is based on some most common knowledge:
- variable usages per instructions
- the instruction kind: if one optimization will remove a declared but never used variable, it will have to take in account if the instruction is a call to a function or if is a simple math operation
- jumps: some optimizations do work just for a sequence of instructions that have no branches and jumps, so looking for jumps and labels are important delimiters for these instructions

Based on this, every time an optimization is performed, the optimization framework in code will recalculate this information so it can be reused. So if the first optimization step needs to check variable usages, and doesn't perform any optimization, the second optimization can use the same usages data, as correct as no change is done.

Some Q&A:
- even 200 ms (for a small) is fast, most time will be used still inside the C++ compiler, so why bother? Because is not always so clear-cut. Also, using a lower optimization level on GCC, or using a fast(er) C++ compiler, makes to matter
- 200 ms is still much (for a 40K application), cannot be reduced to 50 ms (CR doesn't do too much as for me)? In short CR does more than optimizations: it tries to understand the code, to map it to an intermediate representation, after optimizations are done (which is this post is done), CR writes the code on disk. CR have in its design some inneficiencies: for example most CIL instructions are compared using string operations (for clarity), not using the instruction IDs. Using instructions by Id will speedup some (few) milliseconds.
- compilation speed was never an issue for me, so why bother? The CR is intended to be an optimizing compiler/VM. Excluding the fact that most people have fast computers, some people do not have it. Many people develop using an Atom CPU. Also, having a fast compilation speed translates into using these 500 ms (saved) to add in future more optimizations.

No comments:

Post a Comment