Saturday, April 5, 2014

Development will stall for some time

I encourage you to take the CR sources and play with them and try to advance them. Anyway for the next months I will work very little to improve CR: I have little time and I'm split in between work, family and other time consuming issues.

What I've learn is a fairly deep how a CIL instructions work, and that is possible technically to move most of instructions (if not all) to a C++ implementation, but there will be always bugs and work-arounds and they require time which very few people want to invest (none!?).

Based on this, I can say that I will be glad to support people to write tools or help them fixing bugs, but I will not work for now at least to improve CR. Maybe when you have more time, please try to add a feature or to fix something. Some parts are partially implemented and they need just "moving code around" to support more cases: delegates, generics, virtual calls have at least some code to start. If they are fixed, basically it will work a good chunk of .Net 2.0 codes.

This is it for now, I really hope that some people will try to clone the repository and improve the code and later I will be really glad to improve the needed parts, elsewhere it looks for now a too small incentive to make it work.

Wednesday, March 5, 2014

RyuJIT, Mono 3.2.7 and Java 9 (a possible future of stack virtual machines)

First of all I want to notice that a lot of development is in the area of managed runtimes so I want to give my personal take on this. This has nothing to do with the CR project, excluding that working on CR in my free time gives me more insight of the internals of these runtimes.

So what is known as of today:
- Microsoft works for a next-gen unified (no more 64 and 32 bit differences) .Net optimization framework that will be finished in an unspecified future version of .Net. From what we know, it has a much better Regex engine (that is it more than 10x faster than the previous ones) and an around 20% faster startup JIT times.
- Mono improves steadily (it has Loop Invariant Code Motion eventually) and it will rely more for Mono and LLVM integration for mobile phones.
- Java has a modular future with Java 9 (very similar with GAC in .Net, possible stack values and "compact" layout).


So what is the best runtime to invest today, knowing the next two years of proposed upgrades? As always the answer is: it depends. But I can try to split them by usage:
- want to implement a game: I would go with Mono. Mono has SIMD support, when for now .Net has no announced support. The workaround on the MS site is Mono. Even more, Mono supports LLVM as a backend and if it works ahead of time (at least for phone platforms). If you have a weird use-case, you can patch Mono and the Mono runtime.
- want to implement an application: if you want to use native standard controls use Mono. Mono has native controls on all platforms. You can import and develop it (fast) on Visual Studio tools and "migrate" it using for example XWT controls or Eto.Forms, in order not to redo the interface multiple times
- server side: if you will make all clients in Mono or .Net I would pick .Net. At least you can afford the price of Windows Server and/or pricier Visual Studio editions. If you are budget constrained I would pick Java all the way. Mono is a horrible choice. The reason is not how fast Mono runs (as Mono runs "just" two times slower) but because Java has a mature development platform for complex problems. Also, Mono tends to be a one, maybe two, language paradigm, but Java is polyglot: Ruby (and Mirah), Clojure, Kotlin or Scala are top-notch implementations with free IDEs. Even .Net is theoretically better, but the language support in the Java world today is far more vibrant
- fast response servers: today Java is the de-facto big-daddy of implementing fast response servers. Twitter moved from Ruby to Java. There are also more resources which are vendor neutral that document most of Java behaviors (DZone, InfoQ, Parleys, etc.).
- native application feeling (only for Windows): "native" is a word strangely defined by everyone, so I really dislike the term, but playing by its rules, if we define it as a Win32 looking application, the single solution is .Net with Windows Forms (or maybe Xwt). If we mean Windows with animations, the WPF is the single game in town. DevExpress controls are really standing out. Resharper (on Windows) can guarantee great code quality for your application too. OpenSource solutions do not provide something that matches WPF for now, but who knows? Maybe a future designer for XWT will appear for something like this.

So given this preamble, what I would want to see, per platform:
I. Java
- make a low-overhead PInvoke like solution. This makes platform native/Java applications to work slow, so people tend to write them in Java only and skip Java all-together with a simple API.  The pieces will likely be available at the Java 9 release, as they propose stack based types and compact representation. Even more, the packed objects will occupy less memory, another common criticism for Java.
- make tiered-compilation default (even on 64 bit machines) and Jigsaw: Java has profiles of running like Client (fast startup, weak optimizations) and Server (slow startup, strong optimizations). Tiered compilation starts programs in the client mode, and later switches to Server mode for methods that get more optimization. Even better, it would be that the Server compiled methods would run on a separate thread. This can improve the responsiveness of applications
.Net
In fact, it looks to me that .Net is a bit into "maintenance mode", it looks that .Net wants to be the previous VB: a convenience runtime. This is maybe because .Net conflicted somewhat with C++ development and they want to have the two development modes: "the native-fast" C++ and managed with 80% of C++ performance. RyuJit seems to me an effort to simplify code, and this is great don't get me wrong, but other features are missing and some users are missing them, for example a performance mode (there is Server and Desktop mode, but the differences are not in generated code, but in GC configuration and which assemblies are exposed for the application). As for me, the possibility of annotating functions to use SIMD or at least to support Mono.SIMD would be awesome. At last, it would be nice to support a form of full ahead-of-time compilation. I mean to make an executable that depends on .Net and doesn't do any JIT. This can allow some optimizations that can make some usages of .Net in the real world (mostly on games) where it is today mostly avoided.
Mono
For me Mono as runtime is the most advanced, even more advanced than Java, as it supports multiple compilation profiles, ahead of time, LLVM, intrinsics, many platforms (phones OSes, NaCl). Even it sounds maybe picky, Mono for me is missing the user-friendliness (even if it is miles ahead than what it was, let's say, 5 years ago). MonoDevelop (Xamarin Studio) + NRefactory (the framework that is used to give semantic understanding of your code) of code is really good for opensource world, yet still is a bit ignored. Most people are using Mono as a backup runtime, not as their primary platform to develop on. Code-completion is still a bit buggy as of today.

As for Mono and .Net parts to improve, I can tell the two items I found them quite important: escape analysis, purity and read-only analysis: some objects can be proved that they never "bleed" their references. This sounds like a not so important optimization, but it can reduce the overhead to the GC, and this is not a small deal: if let's say 10% of objects are not allocated on heap but on stack, is a speedup with no runtime to optimize for. Even more, there are guarantees of non synchronizing for these variables (which again is a great deal). I would love to have an interpreter integrated with adaptive compilation, like Java does: simple methods that are not run enough time to not be compiled at all. This can mean a lot more optimization (maybe compilation can happen on a separate thread) but also fast(er) startup time.

At last, (you can skip if you are not interested in CodeRefractor per-se), as much as I think that dynamic-compilation is the best way for managed runtimes, I think also that a full static compilation is great for some usages. When your application occupies let's say 10 MB of memory (I know is very little for many GB of RAM today) and you want a fast start time as you want to do some processing, you will not be so happy, if you will see because your application has a GC memory area that is an extra 5-8 MB and the runtime itself occupies another 5 MB that your application to use double memory that you expect. Using a solution like CodeRefactor you can reduce the memory requirements for your application (if it can compile your application), because no runtime is is running independently with your running application and no GC also. Memory is less of a requirement today and we are told for many years that memory doesn't matter, but if you want to get a faster application, you should care about memory compactness of your data.

Wednesday, February 26, 2014

Technical: Which optimizations to expect from CodeRefractor?

This entry is not about any status, but about the status of CR today. So, which optimizations you should expect that CR will do for you, and which will not do it for you.

I will take the most important as matter of performance and usages. Also I will try to see where are problems.

GC and Escape Analysis

If you will look in generated code of CR, when you declare a reference, typically will use an std::shared (also known as smart-pointer, or reference counted pointer). This form of freeing unreferenced variables is very common but even for very simple operations it will work fine. But every time when you assign a smart-pointer, the CPU will have to increment (and later) decrement the item.

This analysis will remove a lot of unnecessary increments/decrements but it has some caveats: it works really well for local variables, but it doesn't do well over the function boundaries and especially in cases of return values.

So how to make your reference-counted program to work fast?

First of all, remove unnecessary assignments of references.

Let's take this simple code:
class Point2D { ... public float X, Y; }
class Vector2D { ... public float X, Y; }
Vector2D ComputeVector(Point2D p1, Point2D p2)
{return new Vector2D{ X = p1.X-p2.X, Y = p1.Y-p2.Y }; }
var v = ComputeVector(p1, p2);

The easiest solution to fix this code is simply: define class Point2D and Vector2D to be struct (value types). Other solution is to recycle the object as the following code:
void ComputeVector(Vector2D result, Point2D p1, Point2D p2)
{
   result.X = p1.X-p2.X;
   result.Y = p1.Y-p2.Y;
}

This is good because the usage code can be something like:
var result = new Vector2D();
ComputeVector(result, p1, p2);
Console.WriteLine(result.X);
Console.WriteLine(result.Y);

If this would be your final code (or a version like this) and result will not be used anymore, the result variable will be declared on stack, avoiding not only smart-pointers, but memory allocation overhead.

Constants and propagation of values

CR will extensively evaluate every constant and will simplify everything at maximum scale (of course, excluding there is a bug). This means that if you use constants in the program, they will be moved all over the body of the function. So, try to parametrize your code using constants if is possible. Also, if you use simple types (like int, float, etc.), I recommend create as many intermediate variables you want. CR will heavily remove them without having any bearing for a programmer. So, if you don't work with references to heap objects (reference objects), use as many as possible variables. The compiler will also propagate the values as deep as possible in the code (there are still small caveats), so if you keep a value just for debug purposes anywhere, keep it, it will make your code readable, and this is more important.
Constants will not only simplify formulas, but will: remove branches in if or switch statements (if can be proven constants). This is good to know for other reason: if you have code like: if(Debug) ... in your development, you should have the piece of mind that this if statement will not be checked anywhere in the whole code.

Purity and other analyses

Without going to a functional programming talk, the compiler will check for some function proprieties, the most important of them are: Purity, ReadOnly, IsGetter, IsSetter.
IsGetter and IsSetter are important to inline them every time you use an auto-property.
A pure function is a function that uses just simple types as input (no strings for now, sorry), and it will offer as a result a simple value but in between will not change anything global. A read-only function is a function that can depend on anything (not only simple types) but will not change anything global, but return a simple type.
Writing code and using pure functions everywhere, you will have  the luxury that the compiler will do a lot of optimizations with it: when you call with constants, it will compute them at compile-time.
So when you write Math.Cos(0) will be replaced with 1.0, and even more, it will merge many common expressions even functions calls.

Unused arguments (experimental)

When you write your code, you want to make it configurable, based on this, you can add many arguments, and some of them will not be at all used. The latest (git) code removes the unused arguments. Do you remember the part of using constants everywhere? So if you call a function with a constant, and this is the single call, the constant is propagated in call, and after this the argument is removed.

This code is as efficient as #if directive, but done by the compiler with no work from your side (excluding the part to set the environment to enable(or disable) debug, or whatever flags you have in your program).

Conclusions

There are other optimizations, but many of them are low-level and are not so practical but in short I can make a small list of things you should do to improve performance in Code-Refractor generated code:
- when you need to change a variable and this variable is a result, give it as an external parameter
- use constants everywhere for configuring your runtime.
- use local variables of simple types as often as you want.
- try to make functions to not change too much stuff, make them small and with one target (like a computation). They will be really well optimized out. It is possible that entire calls will be removed if you write small functions
- make branches configurable with simple type parameters. The compiler will speed up the code
 in some cases if it can prove that your configuring variables are set with specific values.

Thursday, February 20, 2014

Target 0.0.3: Status 3

It looks that you see many statuses in this period for a reason: I had a fairly light workload at work and I had time to think for some fine things to add to CR. In future I will not guarantee the same frequency and the same speed. The best way to make CR to develop fast, is to contribute ;)

So, what is done and note-worthy:
- type-id support and a trivial/crashing Virtual method calls. People wanting to have virtual method implementation, can go into the implementation and fix the remaining things (or still wait): make possible to run virtual functions with more than one parameter, to take in account of escape analysis, and things like this
- type closure is more robust: types are sorted for dependencies, and methods and types are discovered in the same time. It is possible right now to have more entry methods at least on the code level. This can make possible in future to support DLL/lib code generation or to make sure that statically are added all methods needed
- added state of escapeness of variables of unused, the variables in the past could be just: stack, pointer and smart-pointer. This is very important as it can detect (in future) that a function argument is not used, so the compiler can remove the usages of that call and making the call fast(er). More work will be definitely in this area.
- make an unified way to keys to identify methods: this is as important as type-id for me: every time when a function is called, a strange code was written to look if the method is a runtime method, or is a generic method, etc. Even worse was the comparison between functions as overloads. I will expect with this change that generics will be returned with the original prototype (like will be a clone for every static specialization, and no C++ templates are used).

Basically: this status means that there is a big base to be improved in CR. And the more fixes there are, the better you can expect your program to work. So try to look into the code and make simple fixes.

What I expect in future to work at:
- try to make a program wide optimization kind. In the past and all optimization passes of CR are just accessing one function as input, but some cases are better handled if the visibility of the optimizations is global. This optimization could remove a virtual call if it knows that there is just one implementation globally.
- look into the Qt's Core runtime. I worked in my past with Qt 4.x (x being a small number) and my experience was very good. Qt could be a great base to offer a good implementation for some operations like: directory navigation, string encodings, and things like that. It would be really ugly to try to implement them in C++: there is no standard multiplatform  implementation I'm aware to handle them so well

Saturday, February 15, 2014

Target 0.0.3: Status 2

CR backend C++ code generator functionality is extracted. This will reduce the logical dependencies and simplifies some sections of the code. There are also improvements to make Generics to generate templates.

Anyway, this functionality cannot be done in all cases so I will have to find in future a work around. In short the problem is the following. Let's say we want to implement a class like MyData<T>, and T can be either a struct or a value. If T is a class type, it has to have separate logic than struct type. The reason is that in future, reference types have to have extra information to identify these types, a "typeId" integer type. As MyData<int> has no typeID necessary, and MyData<T> may need to keep the typeId of the specific type. Even if we share the typeId and in the case of Struct code will be a no-op, it still doesn't offer a nice solution if the MyData<T> has the new() constraint.

In short:
- there will be an improvement in future that will make generics to work with templates, but the today's code with Generics will do generate incomplete types (so you will get compiler errors). In fact many of these errors are medium complexity, and fixes are always welcome
- code is separated to make easier to understand how the C++ code is generated, this component is a smaller Dll

As a future improvement I would expect to try to (incompletely) implement the TypeId system: all classes defined as references will have an int that uniquely identify their types. This implementation detail is critical to make some instructions to work: is/as, boxing/unboxing and virtual/interface calls.

Update: CodeRefractor works with Linux/Mono (I tested with OpenSuse, but other distro do work nice too). If you use Linux you basically have to do the following:
- make a clone of Git repo
- Enable this repository on Suse, Fedora, or any supported distro: http://software.opensuse.org/download/package?project=home:tpokorra:mono&package=monodevelop-opt (or build a recent Mono/MonoDevelop by yourself, at least 4.0.x)
- open CodeRefractor.sln and run it under "Debug | Mixed platforms"
If you prefer command line after the initial build is done you can try to use it with:
mono cr.exe <any Mono/.Net exe>
I am really impressed of the quality of Mono tools and as for you, if you wanted to work on CR but you never knew that it runs on Linux, right now you know!

Saturday, February 8, 2014

Target 0.0.3: Status 1

One two tree, done!



The compilation step is much faster. Optimizations can work (by uncommenting some lines of code) even by using multiple cores. As for example, the screenshot from the OpenGL application I had presented it earlier was taking like 600-700 ms on an Intel i5-540M. Right now it takes for CR to generate the code around 170-220 ms (the first time is slower, as spinning disks are slower at access). But by all measures is much faster.

Most optimization steps that CR perform do the following: look for a pattern of code that match a property, after that if it matches, it tries to perform the optimization by impacting some instructions, it notifies the compiler that some changes are done. CR after this will try to perform all optimizations up to the point no optimization can be done. CR in a typical case will apply (using the default codebase optimizations) 35+ optimization steps for every function, every instruction, etc.

In typical case let's say there is just one some optimizations that can be done, for example: a variable is nowhere used, so it can be removed. Before noticing that the variable can be removed, the compiler performs other optimizations, and right after will perform another pattern matching. At last when the optimization of the unused variable is match, for every instruction CR will track all declared variables and CR will compare with all used variables. At the end what remains, they can be removed.
But as someone can notice, the step that makes that one optimization succeed is based on some most common knowledge:
- variable usages per instructions
- the instruction kind: if one optimization will remove a declared but never used variable, it will have to take in account if the instruction is a call to a function or if is a simple math operation
- jumps: some optimizations do work just for a sequence of instructions that have no branches and jumps, so looking for jumps and labels are important delimiters for these instructions

Based on this, every time an optimization is performed, the optimization framework in code will recalculate this information so it can be reused. So if the first optimization step needs to check variable usages, and doesn't perform any optimization, the second optimization can use the same usages data, as correct as no change is done.

Some Q&A:
- even 200 ms (for a small) is fast, most time will be used still inside the C++ compiler, so why bother? Because is not always so clear-cut. Also, using a lower optimization level on GCC, or using a fast(er) C++ compiler, makes to matter
- 200 ms is still much (for a 40K application), cannot be reduced to 50 ms (CR doesn't do too much as for me)? In short CR does more than optimizations: it tries to understand the code, to map it to an intermediate representation, after optimizations are done (which is this post is done), CR writes the code on disk. CR have in its design some inneficiencies: for example most CIL instructions are compared using string operations (for clarity), not using the instruction IDs. Using instructions by Id will speedup some (few) milliseconds.
- compilation speed was never an issue for me, so why bother? The CR is intended to be an optimizing compiler/VM. Excluding the fact that most people have fast computers, some people do not have it. Many people develop using an Atom CPU. Also, having a fast compilation speed translates into using these 500 ms (saved) to add in future more optimizations.

Tuesday, February 4, 2014

Skipping release 0.0.2 - jumping to 0.0.3

As for Git commits, you will see some activity in the last week. Anyway the single key features if you follow the Git repository are all reached (to some extent) and all were proposed for 0.0.2:
- better field matching
- auto-properties do work
- better delegates and generics (top Git code is broken now, look why in the 0.0.3 feature set)
- fixes in some cases like math with various sizes
- most optimizations (especially escape analysis) work over more code patterns

As many programmers know code fixing takes time and I took this time, but I will not pack an installer for 0.0.2 as I found that some features are important to be implemented in a different way and there is no support yet (and request) to make a release package, so why do it?

The changes of 0.0.3 will be:
- visibly faster compilation: use-def code will be computed and updated on changes, and not computed on every optimization. The current code is already in the Git repository
- more fixes (of course expect a breakage in the first time)
- the Dynamic-Invoke mechanism should be more fleshed out
- generics will fully use templates <T> instead of using specializations. This translates into a single compilation of one function (from CR side). If you will define a class List<T>, the old code will create List_Int32, List_String, etc
- expected: a fully inter-procedural optimization which tracks over whole program the function calls. This will allow some very powerful optimizations that are used today into GCC's LTO. This can be done just if is it guaranteed that the compilation speed is improved.

As for this release I would expect a "release" time around May/June 2014.

If you want to get more features, you are encouraged to add every small fix. They really add up and can make the users to enjoy using CR.

So in short, if you think that CR can in future match your needs, and even if you find it a bit overwhelming, try to add some fixes or add a new feature like:
- improve command line flags: to support more compilers/OSes. Many fixes are really one or two days of work (maybe less)
- try to make an application from simple to complex and see where CR crashes. Try to add support for that specific instruction. Is it much easier to follow one instruction than to look for entire cases. Try disabling (by commenting, before looking how to set a lower level in optimizations) all optimizations
- add support for a class in .Net using the Dynamic-invoke support. For example add File.ReadAllText(...) and to work at least for some basic codes. Add after that File.ReadAllLines(...)
- try to make an OpenGL application and see what is not supported (you have a sample application in the Git repository)
- try to add an optimization step: make your class that checks a condition of the instruction set. For example if you divide by 0.5, to replace it by multiply with 2.0. Try to add more optimizations like this.
- automate buids and make yourself a packaged installer
- make a MonoDevelop/VS plugin to automate launching of CR against a project
- anything else (that is really welcomed)

CR is free for use (free as in price and freedom) and you are free to customize it for your future needs.