Code Refractor - Virtual Machines/Compiler performance musings

Wednesday, June 18, 2014

Let's box this and that

Roland, a great open-source developer joined CR with his expertise and I'm excited waiting for the new features will appear in future of CR. He also exposed some bugs in implementation and I was happy to fix them!

Some features:
- a simple implementation of auto boxing/unboxing is done
- optimizations are defined by category (no command line flag yet) but right now is possible to add just "propagation" optimizations, or just "constants folding" ones and so on
- some old optimizations were not using use-def optimizations, and right now they do. This means in short: they are a bit faster than in the past.

Monday, June 2, 2014

Status: Updates from time to time

This last month I did not have any commit as I was busy but I will try to give to CR some time. What I will try to address is mostly the maintanance of codebase as my (free) time is not always a thing I can guarantee.

So, in short what to expect:
- I want to cleanup some constructions: I am thinking mostly to make labels to use their hex notations. I will look if the CR works, because I made a bigger program and it looks CR fails (so be aware of this :( )
- I want to make a separate (example) backend/frontend: The idea is this: if anyone will have time, most likely CR will work with a rich C++ runtime (like Qt) to support some common classes. But it will be hard without knowing what is the front/backend job, or is to tweak the compiler
- I want to integrate in future to C# (most likely gmcs/csc) compiler: so you can run something like:
[mono] crcs.exe <...>
and this code will be run the compiler frontend.
- In a longer future I want to add some unit tests

Please add me to Google+ and I will be glad anyone wants to make look into CR and is blocked in understanding how to work with the project. I will be glad to help and assist anyone working with the codebase.

Friday, May 9, 2014

Errors are mostly fixed

By mostly I mean that is somewhat good to use, and there may be a bug here and there, but the type closure bug is fixed. Also the documentation is also updated with the new architectural changes.

So, you may take the tip of the Git repo now and play back. Is not fully stable but it generates most of the code at its place.

Tuesday, April 29, 2014

Small fixes - still hope for external support

In the last days I had some small time to look over the code. In short I want to explain the main change. In the past the CR compiler worked with a singleton CrRuntimeLibrary logic, meaning there cannot be two runtimes at the same time. Of course, this is the typical case, but as in (long) future, it would be nice to support a minimal runtime for some methods (I am thinking around mostly OpenCL) that give a separate output.

Also, this code is split in this way for ideological issues: it is easier to debug and maintain code without singletons. The bugs also show some small limitations how the code was handled in the past, and right now, the runtime can solve methods, giving a more clean case implementation.

There are bugs in TypeClosure logic (maybe I will explain when I have time what it means ;) ) but don't take the Git top code for now if you want working code!

Saturday, April 5, 2014

Development will stall for some time

I encourage you to take the CR sources and play with them and try to advance them. Anyway for the next months I will work very little to improve CR: I have little time and I'm split in between work, family and other time consuming issues.

What I've learn is a fairly deep how a CIL instructions work, and that is possible technically to move most of instructions (if not all) to a C++ implementation, but there will be always bugs and work-arounds and they require time which very few people want to invest (none!?).

Based on this, I can say that I will be glad to support people to write tools or help them fixing bugs, but I will not work for now at least to improve CR. Maybe when you have more time, please try to add a feature or to fix something. Some parts are partially implemented and they need just "moving code around" to support more cases: delegates, generics, virtual calls have at least some code to start. If they are fixed, basically it will work a good chunk of .Net 2.0 codes.

This is it for now, I really hope that some people will try to clone the repository and improve the code and later I will be really glad to improve the needed parts, elsewhere it looks for now a too small incentive to make it work.

Wednesday, March 5, 2014

RyuJIT, Mono 3.2.7 and Java 9 (a possible future of stack virtual machines)

First of all I want to notice that a lot of development is in the area of managed runtimes so I want to give my personal take on this. This has nothing to do with the CR project, excluding that working on CR in my free time gives me more insight of the internals of these runtimes.

So what is known as of today:
- Microsoft works for a next-gen unified (no more 64 and 32 bit differences) .Net optimization framework that will be finished in an unspecified future version of .Net. From what we know, it has a much better Regex engine (that is it more than 10x faster than the previous ones) and an around 20% faster startup JIT times.
- Mono improves steadily (it has Loop Invariant Code Motion eventually) and it will rely more for Mono and LLVM integration for mobile phones.
- Java has a modular future with Java 9 (very similar with GAC in .Net, possible stack values and "compact" layout).

So what is the best runtime to invest today, knowing the next two years of proposed upgrades? As always the answer is: it depends. But I can try to split them by usage:
- want to implement a game: I would go with Mono. Mono has SIMD support, when for now .Net has no announced support. The workaround on the MS site is Mono. Even more, Mono supports LLVM as a backend and if it works ahead of time (at least for phone platforms). If you have a weird use-case, you can patch Mono and the Mono runtime.
- want to implement an application: if you want to use native standard controls use Mono. Mono has native controls on all platforms. You can import and develop it (fast) on Visual Studio tools and "migrate" it using for example XWT controls or Eto.Forms, in order not to redo the interface multiple times
- server side: if you will make all clients in Mono or .Net I would pick .Net. At least you can afford the price of Windows Server and/or pricier Visual Studio editions. If you are budget constrained I would pick Java all the way. Mono is a horrible choice. The reason is not how fast Mono runs (as Mono runs "just" two times slower) but because Java has a mature development platform for complex problems. Also, Mono tends to be a one, maybe two, language paradigm, but Java is polyglot: Ruby (and Mirah), Clojure, Kotlin or Scala are top-notch implementations with free IDEs. Even .Net is theoretically better, but the language support in the Java world today is far more vibrant
- fast response servers: today Java is the de-facto big-daddy of implementing fast response servers. Twitter moved from Ruby to Java. There are also more resources which are vendor neutral that document most of Java behaviors (DZone, InfoQ, Parleys, etc.).
- native application feeling (only for Windows): "native" is a word strangely defined by everyone, so I really dislike the term, but playing by its rules, if we define it as a Win32 looking application, the single solution is .Net with Windows Forms (or maybe Xwt). If we mean Windows with animations, the WPF is the single game in town. DevExpress controls are really standing out. Resharper (on Windows) can guarantee great code quality for your application too. OpenSource solutions do not provide something that matches WPF for now, but who knows? Maybe a future designer for XWT will appear for something like this.

So given this preamble, what I would want to see, per platform:
I. Java
- make a low-overhead PInvoke like solution. This makes platform native/Java applications to work slow, so people tend to write them in Java only and skip Java all-together with a simple API. The pieces will likely be available at the Java 9 release, as they propose stack based types and compact representation. Even more, the packed objects will occupy less memory, another common criticism for Java.
- make tiered-compilation default (even on 64 bit machines) and Jigsaw: Java has profiles of running like Client (fast startup, weak optimizations) and Server (slow startup, strong optimizations). Tiered compilation starts programs in the client mode, and later switches to Server mode for methods that get more optimization. Even better, it would be that the Server compiled methods would run on a separate thread. This can improve the responsiveness of applications
.Net
In fact, it looks to me that .Net is a bit into "maintenance mode", it looks that .Net wants to be the previous VB: a convenience runtime. This is maybe because .Net conflicted somewhat with C++ development and they want to have the two development modes: "the native-fast" C++ and managed with 80% of C++ performance. RyuJit seems to me an effort to simplify code, and this is great don't get me wrong, but other features are missing and some users are missing them, for example a performance mode (there is Server and Desktop mode, but the differences are not in generated code, but in GC configuration and which assemblies are exposed for the application). As for me, the possibility of annotating functions to use SIMD or at least to support Mono.SIMD would be awesome. At last, it would be nice to support a form of full ahead-of-time compilation. I mean to make an executable that depends on .Net and doesn't do any JIT. This can allow some optimizations that can make some usages of .Net in the real world (mostly on games) where it is today mostly avoided.
Mono
For me Mono as runtime is the most advanced, even more advanced than Java, as it supports multiple compilation profiles, ahead of time, LLVM, intrinsics, many platforms (phones OSes, NaCl). Even it sounds maybe picky, Mono for me is missing the user-friendliness (even if it is miles ahead than what it was, let's say, 5 years ago). MonoDevelop (Xamarin Studio) + NRefactory (the framework that is used to give semantic understanding of your code) of code is really good for opensource world, yet still is a bit ignored. Most people are using Mono as a backup runtime, not as their primary platform to develop on. Code-completion is still a bit buggy as of today.

As for Mono and .Net parts to improve, I can tell the two items I found them quite important: escape analysis, purity and read-only analysis: some objects can be proved that they never "bleed" their references. This sounds like a not so important optimization, but it can reduce the overhead to the GC, and this is not a small deal: if let's say 10% of objects are not allocated on heap but on stack, is a speedup with no runtime to optimize for. Even more, there are guarantees of non synchronizing for these variables (which again is a great deal). I would love to have an interpreter integrated with adaptive compilation, like Java does: simple methods that are not run enough time to not be compiled at all. This can mean a lot more optimization (maybe compilation can happen on a separate thread) but also fast(er) startup time.

At last, (you can skip if you are not interested in CodeRefractor per-se), as much as I think that dynamic-compilation is the best way for managed runtimes, I think also that a full static compilation is great for some usages. When your application occupies let's say 10 MB of memory (I know is very little for many GB of RAM today) and you want a fast start time as you want to do some processing, you will not be so happy, if you will see because your application has a GC memory area that is an extra 5-8 MB and the runtime itself occupies another 5 MB that your application to use double memory that you expect. Using a solution like CodeRefactor you can reduce the memory requirements for your application (if it can compile your application), because no runtime is is running independently with your running application and no GC also. Memory is less of a requirement today and we are told for many years that memory doesn't matter, but if you want to get a faster application, you should care about memory compactness of your data.

Wednesday, February 26, 2014

Technical: Which optimizations to expect from CodeRefractor?

This entry is not about any status, but about the status of CR today. So, which optimizations you should expect that CR will do for you, and which will not do it for you.

I will take the most important as matter of performance and usages. Also I will try to see where are problems.

GC and Escape Analysis

If you will look in generated code of CR, when you declare a reference, typically will use an std::shared (also known as smart-pointer, or reference counted pointer). This form of freeing unreferenced variables is very common but even for very simple operations it will work fine. But every time when you assign a smart-pointer, the CPU will have to increment (and later) decrement the item.

This analysis will remove a lot of unnecessary increments/decrements but it has some caveats: it works really well for local variables, but it doesn't do well over the function boundaries and especially in cases of return values.

So how to make your reference-counted program to work fast?

First of all, remove unnecessary assignments of references.

Let's take this simple code:
class Point2D { ... public float X, Y; }
class Vector2D { ... public float X, Y; }
Vector2D ComputeVector(Point2D p1, Point2D p2)
{return new Vector2D{ X = p1.X-p2.X, Y = p1.Y-p2.Y }; }
var v = ComputeVector(p1, p2);

The easiest solution to fix this code is simply: define class Point2D and Vector2D to be struct (value types). Other solution is to recycle the object as the following code:
void ComputeVector(Vector2D result, Point2D p1, Point2D p2)
{
result.X = p1.X-p2.X;
result.Y = p1.Y-p2.Y;
}

This is good because the usage code can be something like:
var result = new Vector2D();
ComputeVector(result, p1, p2);
Console.WriteLine(result.X);
Console.WriteLine(result.Y);

If this would be your final code (or a version like this) and result will not be used anymore, the result variable will be declared on stack, avoiding not only smart-pointers, but memory allocation overhead.

Constants and propagation of values

CR will extensively evaluate every constant and will simplify everything at maximum scale (of course, excluding there is a bug). This means that if you use constants in the program, they will be moved all over the body of the function. So, try to parametrize your code using constants if is possible. Also, if you use simple types (like int, float, etc.), I recommend create as many intermediate variables you want. CR will heavily remove them without having any bearing for a programmer. So, if you don't work with references to heap objects (reference objects), use as many as possible variables. The compiler will also propagate the values as deep as possible in the code (there are still small caveats), so if you keep a value just for debug purposes anywhere, keep it, it will make your code readable, and this is more important.
Constants will not only simplify formulas, but will: remove branches in if or switch statements (if can be proven constants). This is good to know for other reason: if you have code like: if(Debug) ... in your development, you should have the piece of mind that this if statement will not be checked anywhere in the whole code.

Purity and other analyses

Without going to a functional programming talk, the compiler will check for some function proprieties, the most important of them are: Purity, ReadOnly, IsGetter, IsSetter.
IsGetter and IsSetter are important to inline them every time you use an auto-property.
A pure function is a function that uses just simple types as input (no strings for now, sorry), and it will offer as a result a simple value but in between will not change anything global. A read-only function is a function that can depend on anything (not only simple types) but will not change anything global, but return a simple type.
Writing code and using pure functions everywhere, you will have the luxury that the compiler will do a lot of optimizations with it: when you call with constants, it will compute them at compile-time.
So when you write Math.Cos(0) will be replaced with 1.0, and even more, it will merge many common expressions even functions calls.

Unused arguments (experimental)

When you write your code, you want to make it configurable, based on this, you can add many arguments, and some of them will not be at all used. The latest (git) code removes the unused arguments. Do you remember the part of using constants everywhere? So if you call a function with a constant, and this is the single call, the constant is propagated in call, and after this the argument is removed.

This code is as efficient as #if directive, but done by the compiler with no work from your side (excluding the part to set the environment to enable(or disable) debug, or whatever flags you have in your program).

Conclusions

There are other optimizations, but many of them are low-level and are not so practical but in short I can make a small list of things you should do to improve performance in Code-Refractor generated code:
- when you need to change a variable and this variable is a result, give it as an external parameter
- use constants everywhere for configuring your runtime.
- use local variables of simple types as often as you want.
- try to make functions to not change too much stuff, make them small and with one target (like a computation). They will be really well optimized out. It is possible that entire calls will be removed if you write small functions
- make branches configurable with simple type parameters. The compiler will speed up the code
in some cases if it can prove that your configuring variables are set with specific values.