Friday, November 8, 2013

Why not NGen?

A legitimate question, and I want to answer at least in the way I view it:
An honest question: what is the value of your tool over NGEN?
First of all, as I wrote just one blog entry before, I don't believe in "Native vs Managed", because every time people take these words differently: 
- if people think by managed as being safe, STL from C++, or Qt's Tulip and QObject is a "managed" subset that keeps most of what .Net offers (basically bounds checking, strict typing, generics, somewhat consistent auto-free-ing of memory, etc.);
- if people think as Native as being "close to metal", even .Net compile to native all methods as they are executed, so excluding the first seconds of startup, all .Net applications are native

Given that, I see a tool as CR useful still as it has a similar compilation profile as C# and there is a lot of tooling of C#, but an execution profile of C++ with LTO (link-time optimization). As far as I see how CR will advance in future, it will support always a subset of what .Net will be able to offer, so people thinking into removing .Net (or Mono) at least as a toolset of CR, I cannot see it as being done to soon. 

In the same time, even I see CR as "never catching" .Net, it will be still a great tool, and some use-cases I can see them as getting beyond what .Net can offer. 

Let's take a medium program that I think it will be great case for CR (as for a version like 0.5 or 1.0): a developer writes a C#/SDL/OpenGL game and he/she will want to run it on a slower device (let's say for now is a Rasberri Pi, but it doesn't matter that much). First of all, he or she will improve the OpenGL calls, second of all will try to improve the execution profile using Mono. 

Using Mono will see first of all that the application starts a bit slow. Also, some math routines are suboptimal for Mono's JIT. It has two options: to run Mono in AOT mode or to use LLVM's JIT. Using LLVM JIT will see that the startup is even slower. Using AOT mode will decrease a lot of the performance (as Mono --aot mode uses one CPU register for generating PIC code). At the end, it will notice that the game will have small hiccups because the SGen GC makes from time to time to skip a frame.

Using CR, in fact the things will be a bit different: there is no need to setup anything else than the optimization level. Considering that a developer may want to wait even let's say half of minute for the final executable, will pick the highest level of optimizations. This will mean that many global operations will happen like: devirtualizations, inlining and removal of global code, constant merging over the entire program, etc. The code will not be PIC code, but will use all registers and the optimizer can be as good as C++ compilers will get to that moment. Because the code is using reference counting, it will mean that pauses are much smaller (no "freeze the world" needed) and there are optimizations (already as of today in CR's codebase) to mitigate the updating of reference counting (CR is using escape analysis).

Some problems will still remain for the user: as is using reference counting, the developer has to look for memory cycles, but on the other hand, these cycles are easier to find, not because CR does anything special, but because C++ tools today do find really easy memory leaks (and CR does name the functions the same as the C# original name is). In fact is it easier to find a leak using ref-counting than a GC leak: start Visual Studio using Debug configuration of the generated C++ code, and at exit of the program all leaks are shown in the console.

At last, CR can add as many things as the developer community will contribute because CR is written in C#, is easier to handle high level optimizations than would be to hack them into Mono runtime (which is C) or in .Net (which is impossible, as the components are not public for modification). Some optimizations that can be done explicit and require much less work from the coding standpoint is marshaling and PInvokes (an area I would really love that CR to improve). When you call a method in a DLL/libSO, in .Net (or in Java for that manner), there is some extra "magic" like some pointers are pinned, and some conversions between pointers occur. In contrast, is it possible that this marshaling (and certainly there is no need of pinning) to be removed all-together in some cases. For example if the developer knows that it uses OpenGL32.dll (or libGL.so), to link using -lOpenGL32, using a special compiler flag. This is not a big win for some libraries, but is big for others, because it doesn't need an indirect call. 

So in short, think about CR as a C# only VM that takes its time to compile CIL. At the end it outputs some optimized C++ which can be optimized further by some modern compilers. It is easy to hack (according to Ohloh is only 17K lines for now and it has support for more than 100 IL instructions, it includes a lot of reflection code, more than 20 optimization passes, etc.)

No comments:

Post a Comment