Code Refractor - Virtual Machines/Compiler performance musings: August 2015

Wednesday, August 12, 2015

Write Your Desktop Application in a VM for Your Security!

Today when I arrive to my computer at work I receive the Windows Updates. 1 GB... 1 GB! Most of them are of course security patches which go over Windows, Office and things like it. If you look deeper, you can see that is not only in regular code but it is all over the place. This happens also for Windows 10 like ZDNet confirms.

The updates are in .Net framework, graphics drivers, mounting devices (and Office as told previously) and so on.

These components are as we can guess mostly in C or C++, in part because it is harder to look to all buffer overflows in all Windows codebase, but it is also in part because lower level languages require a hard(er) time for developer brain so it makes harder without very deep code review to get these things fully right.

I hope that most readers could understand this and I would also expect that most of readers are also writing code in .Net (and Java and JavaScript) but I want to express only one idea which in most of the time the security as being hard in itself, adding the concerns of low level bounds checking, makes the security to be very hard to achieve. So it is more economical (and logical) to externalize those risks for other companies (like the OS vendor, the VM creator(s) and so on).

But the latest reason why I do think that is also important to use a VM is the simple fact that is visibly easier to patch your code. If it is JavaScript or Flash, you do upload new application on site, and you're already patched. Users have to refresh the browser.

If you run your code in Java or .Net, if is a very low level security vulnerability, you ask users to upgrade, if it is in your application, you have functionality more or less built in. It is very easy to download files using either Java or .Net and to extract them if it is used a zip format.

But if you use C++ you have to compile the application, have the updater a bit awkward written (as there are some Windows APIs supposedly to do some C++ code), you have to make sure that it supports the right machine (like x86 or x64) and "you're good to go".

With the world of AppStores there is an argument that C++ can be deployed as easily, but in part I still don't think so for one reason or two: if you deploy your Android Java code, you don't bother with which CPU has the tablet, for example a MIPS32 or MIPS64 one. For iOS you have to support basically two platforms because Apple environment is tight, and for Windows by far the easiest way to work is still with C#. Also, an argument that the iOS environment it is itself like a virtual machine now,

Tuesday, August 4, 2015

Premature Optimization Is (Almost) Mandatory

"Premature optimization is the root of all evil" was told by Donald Knuth, right? Right, but he was misquoted. He said in full: that for 97% the premature optimization is not necessary. You can access the full paper where the quote is taken from here. Even more, he said so in context of using ugly constructs (he was refering on GOTO statements). And more, he did point out that statistically the hottest of the code is in 3% of the code, and the full statement of him was: "We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.". So, he doesn't say about stop to optimize (don't forget, that "premature" is a loaded word, having already a negative connotation) but the reverse, to optimize the code that is hot.

Based on this, I found many misconceptions regarding optimizations and at least this is my view on it (from the most important to the weakest ones):

1 - "You should not optimize in your game/application the loading time, this happen just once, after this application runs fast/properly". There is some truth to this statement, for example if you watch a movie, you should not care if the movie player starts in 0.1 seconds or 2 seconds. But what if the movie player starts in 30 seconds? Would you want to watch using this movie player to watch a 2 minutes clip? Many developers have quite powerful machines, like they have SSDs, 4 cores with at least 8 GB of RAM, but their application will arrive to users that do not have these cool components and the application will run visibily slower

2 - "The redesigning of our architecture will bring the application performance by 4x, and optimizing the current design will give to us only 2x speedup, so there is no need of small optimizations" - but very often this 2x speedup would mostly be transferred in many places to the new redesign, and the architecture redesigns are very expensive (as developer time) to do them. Let's say the company switches from any SQL kind of servers to NoSQL but the application logic is the same. Let's assume that in that scenario jumping from let's say MySQL to MongoDB will give a 4x speedup for queries, and let's say the queries were using 2/3 of the time of the entire system. The final speedup will be exactly 2x. But let's say that the company optimized only the part which is not related with SQL and the speedup is 2x and for the entire system the speedup is 33%. Even it is slow, customers will have a tangible benefit next week/month not after some months when maybe they quit already as potential clients. Even more, when the migration from MySQL to Mongo happens, the system will work 3x faster. But as real life happens, sometimes can be a pathological case that for a customer Mongo runs in fact 20% slower when migrated, but because the optimizations are done on the system level, the system would run still more than 10% faster. There is a lot of math here. done on the back of the envelope, but it simply states that it never hurts to have small optimizations done.

3 - "When I develop I don't need optimized workflow, my machine is really fast": this is kinda true, but sometimes is not that true. Many big applications take long time to start which is again kind of normal, for example it needs to get updates from server, and you as a developer you can pay (because you have SSD and 8 GB of RAM) at least when you are developing to wait for 10 seconds to get the real case data. But if you have to reproduce a bug, imagine that every second counts. It counts because it annoys development, it interrupts it. Especially if your build system takes minutes you will notice that you go to a blog and you read the news and rants (like this blog) but you lose the focus of which bug you were really working on. This is not a fault necessarily on your organization, but this is how human mind works. This is kind of the first point ("you should not optimize loading time") but is directed to developers. You as a developer have to make 1 step build if possible and focus to not do crazily much stuff.

4 - "You don't need to optimize your code because compiler knows better" - I am sure that micro-optimizations like using minimal numbers of variables a compiler will do always a better job than you, but this is blatantly false. Compiler can optimize your code but most code is not run within the compiled code, especially in managed languages (C# or .Net ones, Java, JavaScript) you will see that the compiler runs a lot of code with libraries. Most compilers cannot optimize string concatenation, even though Java will use StringBuilder when you use + to concatenate strings. And the reason it does it in this way it is because compilers don't work well with strings. Every time your code does read from files, a compiler will not know a lot about your file format, duplicates of data, or the fact that you could read less data and rebuild the information. No compiler cannot know if you load 2 times the same image, that it should load it once and cache it, and so on. Even worse, is that even we allow to think that your environment is well optimized, it means that only your code remains the slow part.

5 - "I should not speedup my web service, I will put it on Azure (replace this word with your Cloud solution)" Not sure about you, but having a faster web service means that you have a simpler administration as you need to spawn fewer instances, smaller costs, even the improvements of code could be a bigger upfront cost.

6 - "You don't need to optimize allocation, GC does it fast(er)" Did you measure this? GC definetly has quicker allocator than let's say C++ one, but every time when you do a "new" for a heap object, the object has to zeroed, it also moves the allocation pointer and it means that it makes the CPU cache line "dirty". If you have some code that reads from a file line by line and you have your own "read line method" (I'm saying especially if you want to improve the load time performance, see point 1), you may make a reactive interface, and instead of allocating a new buffer, it looks to me a fair design to just recycle the buffer. The speedups on .Net side are fairly significant, and I would expect the same on Java. Allocating more seldom these small objects will make the GC to be called less frequently.

A bit wider problem of architecture redesign. Today in most companies I work they do use Agile methodology, which is in itself an incremental methodology. This makes almost impossible to make in big systems architecture refactors and even they do them, they are done by the most senior team members, which they know "the core" well. This means that it is possible that an architecture refactor can take not months, but years sometimes, because you cannot risk to break existing iterations, so the code is prepared with small small steps to accomplish this redesign.

In conclusion, this post is not specially to use GOTOs, which both me and Knuth would disagree, but the idea is that every time when you can isolate (in a profiler) some slow code, optimize it now, not tomorrow. The later you do it, you will suffer it in testing it, having a bad application experience (and users will feel it also very often!).

Monday, August 3, 2015

Visual Studio 2015/C# 6/.Net 4.6 (RyuJIT) review

This maybe it is in context of Windows 10 launch when the impression was a bit of a buggy release (and with the fact that some families of video cards are not even supported, like NVidia 4xx cards or older) Visual Studio got a much less attention.

I think this is right for most users, but on the other hand I do feel that this release is extraordinarily... strange. It is outstanding with some features like including of profiling tools even in Community Edition (the profiling tools are limited but still much better ones than the previous not- included ones).

The first impression I had was many fold positive:

- C# 6 which looks to me like a streamline version of itself which was forgot basically from the times of .Net 3.5/VS 2008 (that come with Linq and var keyword). Making code to be less repetitive is an amazing stuff. If you have time to listen for more than one hour, this presentation is excellent. Please push in your company to use C# 6, that excluding if you use string interpolation, doesn't require any .Net support. I'm not a VB.Net guy and I cannot comment much, but I expect to be good stuff here also.

- Roslyn idea even it was as a part of NRefactory for years, it is really well implemented at least that as you type you can see very reliably if your code has errors. No full build to see if are failures. This is really a huge timesaver in itself. This "language service" which is exposed as an open API will make that C# will not have strange behaviors in completion, especially if you will use future versions of CodeRush or JustCode. I love Resharper, but it is still great to know that Roslyn will be part of future SharpDevelop and MonoDevelop release

- .Net 4.6 comes with awesome improvements, I would expect in future to see releases like Paint.Net or photo image manipulation programs or some entry level video games to support SIMD libraries. They come for free, but there is a caveat for now. It still has some obscure bugs (which to be fair, are to be expected) especially if you run F#. The reason why only F# appears to be affected is in part natural, it is because F# requires to allow "tail call optimization" which in turn changes recursive calls into loops. Without it many F# programs can either run with "stack overflow" or have very ugly performance profile. So don't rush for now to run it into your production server, or do it only for your VB.Net/C# code

- even I'm not a C++ developer, it looks that Visual Studio supports very well C++ standards, which again is a great achievement, so you can target with one C++ codebase basically all platforms (like iOS, Android and Windows) without strange #define

As a .Net developer I am still disappointed with .Net which looks today excluding for web stacks (and even there the solution was mostly made as a response of NodeJS/small web servers from Ruby or Java world) so it looks as a desktop tool incoherent. I honestly don't know a Microsoft stack that I can support more than one platform, even in Microsoft's ecosystem. WPF is decent, they patch it, but it looks to me is like an MFC which runs on top of DirectX9. Not DirectX12.

Even more strange is when you install Visual Studio it comes with no package to develop with .Net on other platforms (like Mono) so up to the point that NRefactory is stable enough, your C#6 code you run will run only on Windows or on Linux as an CoreCLR .Net distribution, but not on Mono. This is kind of a bummer if you ask me.

Even more, and this is in fact not a rant against WPF, but as they improved VB and C# (and C++ for that matter, and F#) why they didn't improve Xaml. Xaml is an horrible language, if you can name it so. It has various framework conventions which are almost always broken. You add on this that WPF platform without (and even with) custom controls runs slow with more than some thousands of items. The reason is not that is not GPU accelerated or are GPU drivers faults, or that DirecX9 drivers are not to the snuff, but because when you profile WPF applicaiton, you will see that the internal layouting is hogging the CPU.

If you add other and other issues, it looks to me that if you want to written an application that is for example cross platform, you have mostly Xamarin solutions (MonoGame, Xwt, Gtk#, Xamarin.Forms, and so on) which is at least for me a bit strange.

What I would hope that the VS+1 will support in no particular order:

- polish the software more: it looks to me that Microsoft has right now quality issues all over the products. Complex software is hard, but working little by little and releasing with two features less will make the environment more nice. Not sure about other uses, but at least under Windows 10 but with latest updates, I had fairly many freezes and crashes. I definitely had much fewer under latest releases, but from time to time I still have "blue screen ;) " in Windows or VS hanging sometimes. Especially under debugging situations

- give a clear vision about which frameworks are supported by Microsoft. I'm talking here WPF in particular, but I think that many other frameworks (which include WCF, Silverlight, even the original WinRT code) are either not well exposed or not clear when or how they are supported. This makes very hard for some developers (like myself) if I would have an idea of a startup to start with Windows for a two years project. Java even it is worse technically (in many ways it is worse), I know that they don't let freeze some features, and most of them are in the open. Visual Studio comes with tools from editing Html, to C++ coding for Android. It looks to me like a dinosaur, but maybe is my limited judgement

- should not try to put under one IDE all languages/platforms. And the reason why is that VS is not an open platform like Eclipse. People will not extend it to make CAD modelling out of it. Even it lets you unselect them, by default are to many things included. Features do not matter only by count, but by making a sane experience for users. Use NuGet for adding language services.

- this maybe is easier to say than to do: start with TypeScript and make a .Net language that resemble it. Make a very light language similar with Swift to work for both "Desktop" and "Web" world. C# is really better in my view than Java (which was competing with) but to be fair JetBrains' Kotlin language is definitely more usable. Ruby (excluding that Ruby is not strongly typed) is again more usable than C#. But the "static version" of Mozilla Rust looks really promising and is clearly high performance. Maybe the starting point should be Visual Basic.Net but remove the legacy and make similarly a C# without the legacy. To be forced for example to not iterate without IEnumerable, and you will have to create a separate code (similar with what C# developers write with "unsafe" code) for people who still want C#.

Code Refractor - Virtual Machines/Compiler performance musings