Wednesday, January 27, 2016

Java's Flat Collections - what's the deal? (Part I)

I've been moving my job to Java environment and as my previous interest in performance is still there, I thought why sometimes Java runs slow (even typically - after you wait your code to warm up)  and I described some solutions that are around Java's ecosystem.

There are 3 solutions for now which are competing to give high performance Java code by allowing quick performance or OS integration: 
- PackedObjects, a way to remove object headers to object collections and it works sadly for now only with IBM JVMs, It should be primarily used by JNI like code to speed it up and removing individual copying. It requires medium changes in compiler, garbage collectors but no language changes (or minimal ones)
- ObjectLayout, a way to give hints for JVM to allocate continuously arrays in a structured manner which may be implemented. It requires GC changes, very few annotations but no language changes 
- Array 2.0 (or Project Panama) is the project which basically plans to bring .Net's struct type to Java, This is the most extensive of all because it has to change: bytecodes, internal changes inside compiler, inside GC

So, I'm here to present a new solution which I found it handy, but it is in very early stage and requires no language changes (still, to take advantage of this, you require yourself some few code changes), it should work with any Java at least newer than 5.0 (maybe Java 1.2, but I'm not 100%) or if it is not fully possible to work with this solution, it will be very easy to patch.

Enter FlatCollection, a code generator which flattens your collections and can make it very easy to work with high(er) performance code for many common cases.

How does it work:
- you find any types it has the same type of fields (for now I think the coding supports only primitive types, as the fully working prototype works with Point of x,y integer fields, but very likely at the time you may read this code, it will work as a generator for any field type)
- you add all types with full namespace inside: input.flat file
- you run the project to create two flat classes out of it: an ArrayListOfYourType, and a CursorOfYourType
- copy all these files inside package you will add in your project: "flatcollections"

Look inside a "realistic" benchmark code to see the same code using an array of Point and this mapped ArrayList inside RunBench.java .

In this kind of real life, the memory consumption for this collection is in range of a half of a full array of points, and the performance of both populating it and reading it is at least 2x-4x in performance.

How does it work: it merges all fields in a continuous array of "primitive" types and removes basically one indirection and many allocation costs.

I will extend in future the samples to show parsing of CSV files and operations like it. If you reuse the collection using  .clear() call, no reallocation is needed, excluding  the new "per-row" code allocates more memory than previous implementations.

Why is important to flatten the data layout? Basically, you can reduce the GC hits or you can map naturally code that was ugly otherwise: let's say to have a Tuple class. Also, the full GC cost (which involves visiting all small objects in your heap) is very low on these collections. So I would assume at least for batch processing or maybe games written in Java it could be a really friendly tool of trade.

What should be done:
- it should be tested to work with collections of any type and to support specializations
- it should work with POJO which are not exposed as fields including Tuple classes
- not mandatory but it should support iterators or any other friendly structures

Sunday, December 20, 2015

Fx2C - small tweaks

Fx2C transpiler does get a bit smarter but also in a meaningful way. If you plan to use Kotlin mixed with Fxml (a feature which I think neither Oracle or JetBrains thought supporting) you will see that there is no (straight forward) way to support @Fxml fields.

Do not worry, if you have a Kotlin controller class, you can write something like inside fxml file:
<!--   Flags: IsKotlin-->

And the Kotlin code will work seamlessly:
package desktool.views.kotlinView

import javafx.event.ActionEvent
import javafx.fxml.FXMLimport javafx.scene.control.Button
import javafx.scene.control.Label

class LineToolControler {
    var thickness = 1   

@FXML    
var label: Label? = null    

@FXML    
var button: Button? = null    

@FXML    
fun handleButtonAction(event: ActionEvent) {
        println("You clicked me!")
        label?.setText("Hello World!")
    }
}

This is not to say that right now it is better to use Kotlin, but the idea is that maybe in future the code will be extended to support various other languages (Scala is another language which comes to mind) with JavaFx.

Another important part is that right now the compiler will generate a preloader class (Fx2CPreloader) which can load all classes in memory so for second time the JVM will start the dialogs in a fraction of the second. Under my unscientific testing, using a slow Atom CPU (Baytrail) machine, a medium sized dialog first time loading of classes into JVM could take something like 600 ms, but with preloading, the time can be reduced into 2-5 ms.

So, is it important to use this preloader? I think that yes, especially if you have many UI elements, using this preloader under a splash screen will warm up the JVM and will make the 2nd run to make your dialogs to be seen (close to) instantly.

This Fx2C tool is for me mature enough to be in maintenance mode for now, as it works really well enough for my taste/usages and I will likely use to make more JavaFx applications and just to feel responsive, a feature which I was feeling as missing going from a .Net WPF environment.

Wednesday, December 16, 2015

Vampire Logic - my team's entries

I was participating to create 3 apps using Microsoft Universal Apps APIs over a zombie competition (hackatlon). My ad-hoc team (Vampire Logic) did really well there.

Source code for these 3 (working) apps (in categories Productivity/Games/Educational) were created in 18 hours combined (around 6 hours per app, give or take).

Productivity: an image editor with camera support, Modern UI, live preview and infinite Undo levels:
https://github.com/ciplogic/zombie_productivity

Game: a zombie ship fights in a top-down shooter with animated background. It uses efficiently a Canvas control and fluent animations using only standard Universal Apps code
https://github.com/ciplogic/zombie_game

Educational: An interactive math game where the harder and harder math problems are given in limited time. How long will you survive?
https://github.com/ciplogic/zombiedu

The coding practices may be spotty at time, but excluding the fact that that the applications were written in only 6 hours (and was our first ever Universal App coding experience), all applications had no known bugs in the way coding was done (like no crashing, or solving errors just with big try-catches to hide them) or similar stuff.

Coding language: C#

Team members: Dāvis Sparinskis, myself, Linda Legzdiņa, Rudolf Petrov

Some photos with my team:





Friday, December 11, 2015

Finding non-duplicates in an array (using Java)

I had a job interview and I will do a lot more Java stuff in day-to-day coding (yay!) and one part of the job interview was about searching non-duplicated values from an array. There is a technical solution (which I would not divulge) which is with O(n) complexity (meaning that the maximum it should scale up linearly with the size of the array) but can it run faster?

Faster than O(n) complexity wise is kinda impossible, because you have to traverse the once the array of data. But as we've seen in an older post, if you can get the constant down, you can get a faster algorithm. So, what about searching it naively the chunk of the data and put it into an int array?

This is not my best benchmark (and I didn't try all combinations) but up-to 100, an native int[] (int array) structure would run faster than using the O(n) complexity.

The code that would search an index (and it will return Integer.MIN_VALUE if no value is not duplicated) is the following:


    public static int GetFirstNonRepeatingIntArray(int[] values) {
        if (values == null) {
            throw new NullPointerException("values");
        }
        int[] counters = new int[values.length * 2];
        int counterSize = 0;
        for (int index = 0; index < values.length; index++) {
            int toFind = values[index];
            int foundIndexValue = getIndexOf(counters, counterSize, toFind);
            if (foundIndexValue == -1) {
                counters[counterSize * 2] = toFind;
                counters[counterSize * 2 + 1] = 1;
                counterSize++;
            } else {
                counters[foundIndexValue * 2 + 1]++;
            }
        }

        for (int index = 0; index < counterSize; index++) {
            if (counters[index * 2 + 1] == 1) {
                return counters[index * 2];
            }
        }
        return Integer.MIN_VALUE;
    }

    public static int getIndexOf(int[] counters, int counterSize, int toFind) {
        for (int foundIndex = 0; foundIndex < counterSize; foundIndex++) {
            if (counters[foundIndex * 2] == toFind) {
                return foundIndex;
            }
        }
        return -1;
    }

For example for 10,000 repetitions of the algorithm if arrays are of 50 items (which are randomly generated in a range of 1 to 25) it would give the following output:
Total execution time: 100ms
Total IntArray execution time: 31ms

Does it have any implication in your day to day life this kind of coding? I would say yes: when you work typically with "Settings" kind of classes, you will better work using arrays/Lists than dictionaries, even it is counter-intuitive: there is very unlikely you will get hundreds of settings, but both as debugging experience and performance may be better. Memory usage (especially compared with Java's HashMap implementation) is also much better.

Monday, November 30, 2015

Fx2C - Bug Fixes

JavaFX seems to be kind of a black sheep for some Java developers but I wouldn't feel so negative. The reason is that to some extend I feel the idea that many applications migrate to an online environment and for them server side exposes basically a JavaScript/Html5/CSS UI but on the same time I would expect to be a a kind of desire of using offline applications at least at times.

And here I see that JavaFx can shine. Few things still need to be done by Java runtime in my opinion, but Java if deployed, it is a great offline runtime which runs on the 3 major OSes and this is not a small deal. You can literally write once one UI and with no recompilation will run on all (desktop) platforms and if you consider JavaFxPorts, you can add Android to it.

So based on this I fixed some bugs with JavaFx types handling and use Java only (remove the Kotlin language) so it is easier to fix the code for non-Kotlin developers.

Code is here:
https://github.com/ciplogic/Fx2C


Monday, November 23, 2015

New useful Kotlin tool of the day: JavaFx 2 FXML Compiler


Do you write by any change Java? Do you use a Desktop client that is Swing and you didn't use JavaFx because is to slow or you use JavaFX and it "feels" slow even for very simple dialogs?

There is a tool now with a demo:




If you want to contribute to it, please install IntelliJ Idea (Community is just fine), clone the repository: https://github.com/ciplogic/Fx2C and after this load the project.

Legal notice: use it at your own risk. I'm not responsible for your problems. But with all seriousness, the code is simple to be maintained.

Thursday, November 19, 2015

Tricks of the trade for quick sorting in .Net

I rarely do huge dataset sorting, but I want to share some tricks to sort huge datasets. I am surprised that very few people learned things like it even in university.

The reason is that as students you are thought to think that the CPUs are ideal and you learn big-O notation. This sorting stuff is in the most cases O(N*logN) meaning that for a list of 1,000,000 items you will have let's say 1 second, but for 4,000,000 items, you will have a very small growth to something like 4.1 seconds.

So how to improve the sorting speed? Reduce the constants.

The point of big-O notation is that it considers the "scale" of growing the time, but it doesn't take into account the individual constants.

So, let's say you have 3.000.000 items to sort inside a List<String>. This is a hypothetical but not so hypothetical, in the sense that are huge lists of items that you may need to sort in few milliseconds, and increasing the item count can show much clear where you have speedup.

 Let's say you add those items inside a list, and you use List.Sort(). I created the items semi-random distributed (most of ASCII characters are setup random) of these items. On a 64 bit .Net (4.6) on an oldish Intel i5-2500K, it would run in fairly short time (13067 ms) or 13 seconds.

Can you speedup more? The simplest trick to check first char first, as a separate check. If the first char is the same, you use full string comparison. This should go in 10189 ms. Another small improvement is to sort over arrays. This is fairly small speedup, but is still quicker (9940 ms).

The comparing sorting key class would be like following:
     struct CompareStr : IComparable<CompareStr>
    {
        public char FirstChar;
        public string Text;
        public CompareStr(string text)
        {
            if (string.IsNullOrEmpty(text))
                FirstChar = '\0';
            else
                FirstChar = text[0];
            Text = text;
        }

        public int CompareTo(CompareStr other)
        {
            if (FirstChar != other.FirstChar)
                return FirstChar.CompareTo(other.FirstChar);
            return Text.CompareTo(other.Text);
        }
    }

And the sort routine of those texts is:

             var combinedKeys = texts2.Select(text => new CompareStr(text)).ToArray();
            Array.Sort(combinedKeys);
            var resultList = combinedKeys.Select(combined=>combined.Text).ToList();

But can we do it better? I think that yes, so let's change FirstChar to pack the first two cars padded as a 32 unsigned int (char itself is kind of equivalent with UInt16). The times also improve greatly (6220 ms) which is less than half of original time:

     struct CompareStr2 : IComparable<CompareStr2>
    {
        public uint FirstChar;
        public string Text;
        public CompareStr2(string text)
        {
            if (text.Length <= 2)
                FirstChar = 0;
            else
                FirstChar = (uint)((text[0] << 16) + (text[1]));
            Text = text;
        }

        public int CompareTo(CompareStr2 other)
        {
            if (FirstChar != other.FirstChar)
                return FirstChar.CompareTo(other.FirstChar);
            return Text.CompareTo(other.Text);
        }
    }
And the sorting routine is very similar with first code:
         var combinedKeys = new List<CompareStr2>(texts2.Count);
            combinedKeys.AddRange(texts2.Select(text => new CompareStr2(text)));
            var items = combinedKeys.ToArray();
            Array.Sort(items);
            var resultList = new List<string>(texts2.Count);
            resultList.AddRange(items.Select(combined => combined.Text));


Can be written to be really faster?
   
    struct CompareStrDouble : IComparable<CompareStrDouble>
    {
        double _firstChar;
        public string Text;

        static double ComputeKey(string text)
        {
            var basePow = 1.0;
            var powItem = 1.0 / (1 << 16);
            var result = 0.0;
            foreach (char ch in text)
            {
                result += basePow * ch;
                basePow *= powItem;
            }

            return result;
        }
        public CompareStrDouble(string text)
        {
            _firstChar = ComputeKey(text);
            Text = text;
        }

        public int CompareTo(CompareStrDouble other)
        {
            if (_firstChar != other._firstChar)
                return _firstChar.CompareTo(other._firstChar);
            return Text.CompareTo(other.Text);
        }
    }
   
For reference this is the sorting code:
         List<stringSortSpecialDouble(List<string> texts)
        {
            var combinedKeys = new List<CompareStrDouble>(texts.Count);
            combinedKeys.AddRange(texts.Select(text => new CompareStrDouble(text)));
            var items = combinedKeys.ToArray();
            Array.Sort(items);
            var resultList = new List<string>(texts.Count);
            resultList.AddRange(items.Select(combined => combined.Text));
            return resultList;
        }

This sorting key is really, really fast, 2292 ms which is over 5 times quicker than the original List.Sort (for strings).

Some things to consider:
- Sorting huge datasets may show a design flaw in your application: You should always filter data before sorting it. This use-case may be important though if you get local data to sort. This algorithm of sorting can be used though with Linq's OrderBy or friends. If you have data from database as a source, the database can sort the data for you, so it is no point sometimes to sort yourself the data if you get it from your database.
- This algorithm considers that sorting only when you need it: I am considering here cases when you can have a grid with many thousands of items and you can sort over the header of the column. If you have 300,000 items (this is a huge grid BTW), the sorting using List.Sort is around 1030 ms vs the fastest key used here which is 200 ms
- At last, this algorithm is not culture aware, but is damn quick! This is a tradeoff you have to consider that may break some people's language. For instance, in Romanian (my native language) S  is near Ș char, but using this sorting algorithm, Ș char will be after Z. The same will happen with Russian, or all other chars. So if you care this kind of sorting, make sure you can afford a bit worse precision for non English words.

Running on Mono (I was really curious, weren't you?) gave really horrendous times (for only 30,000 items):
.Net times:
List.Sort: Time: 51
Sort by first char: Time: 65
Sort by first two chars: Time: 33
Sort by double key: Time: 30

Mono time:
List.Sort: Time: 411
Sort by first char: Time: 329
Sort by first two chars: Time: 106
Sort by double key: Time: 39

The fastest implementation was consistently faster, but the things got very quickly worse. For 300.000 items:
.Net times:
List.Sort: Time: 1073
Sort by first char: Time: 709
Sort by first two chars: Time: 420
Sort by double key: Time: 193

Mono times:
List.Sort: Time: 5825
Sort by first char: Time: 4917
Sort by first two chars: Time: 2268
Sort by double key: Time: 409

And for 3000000 items the times are really huge:
.Net times:
List.Sort: Time: 12939
Sort by first char: Time: 9851
Sort by first two chars: Time: 6143
Sort by double key: Time: 2259

Mono times:
List.Sort: Time: 81496
Sort by first char: Time: 70736
Sort by first two chars: Time: 38233
Sort by double key: Time: 5886

So, using latest algorithm presented here, on Mono at least seem to be 10 times faster using this primitive comparison between various strings. I can assume that the string comparison under Mono is not well optimized (like to compare two strings in Mono looks to be around 6-8x slower than .Net) and the tricks used in this algorithm can give to you the user a huge speedup. So if you use string comparison on a Xamarin supported platform for sorting purposes, you may really use a similar techniques as described here.