Sunday, September 20, 2015

Optimizations on bigger data sets

I started around one year ago jHeroes2, wanting to be a port of FHeroes2 from C++ to Java. As I had the code around, I also tried to do part of code in C#. The ugly part was that using a standard toolkit (I'm saying about JavaFX or WPF in C#) the drawing is fairly slow. Is faster than let's say Windows Forms on displaying multiple bitmaps, but it is still fairly slow (or I don't know myself how to optimize the code for either toolkits).

But one interesting part of the bunch is that I learned some tricks that were done originally by the team of the Heroes 2 when they did their "big archive with all content" file. They did a full index, and later the file content is made as a simple algorithm of "painting". The algorithm of painting is really nebulous and I'm impressed by the idea that the FHeroes2 guys succeeded to uncompress all pictures.

So based on this and also of my working place experience, I thought that it would be handy to take all pictures of Heroes2 main Agg file, decode all images and (re)compress them as full bitmap data. As data is compressed, as a result I did make a full zip with all pictures are inside Heroes 2 that could be decoded and I repacked them. I did not use indexed colors (even they would reduce the bitmap size) and I did not save them as native bitmaps because I wanted to check some things I will elaborate in this entry.

So what the compressor does:
- iterate over all pictures in heroes2.agg file (a 43 MB file)
- extracts them in memory as bitmaps
- take every bitmap and saves it either in a text format, where every pixel is a hex value, or a binary integer array
- compress every array of text/bytes to a zip entry

What the benchmark test does:
- take the zip format entries one by one
- decode width/height first then creates an integer array and reads/decode the hex strings/byte arrays
- converts this int array into an Image format.
- ouputs the time.

First of all the zip file for binary data looks like following:


This in short presents 15.000 files (pictures) that if are extracted would be like 250 MB of data that if decoded to disk would look like this:




Given this much data I'm sure that I would be interested to see how quick it would work to decode all photos. 

So first of all, having these two zips, I would want to have a baseline. So for this I started with .Net code to extract all zip content micro files in byte arrays into memory. The timing was the following:
Time: 2604 ms
Time text: 3276 ms
This means in short that if you would want to uncompress with the latest .Net on a 64 bit machine on a fairly quick laptop it would take to you to around 2.6 seconds for a binary compacted data and around 3.3 seconds just to uncompress the data.

I was running the same code with Java for extracting, and it was running in around half of time. So using hex data, the decompression time will be closer to 1.5 seconds, but the times are like following. 
Time (extract text): 1678
Time with string split: 7012
Time no alloc: 4474

Time (binary data): 1685
Time (extract data): 943

A graph (with shorter bars, are better):

So, what I eventually found was that you can write quicker conversion from binary data to image in Java meaning: extracting 15.000 files in memory, make them int arrays then  convert them to pictures, in less than 1.7 seconds on my machine, than .Net time to extract the pictures.

This is great, but also, I've wanted to see another more real-life use-cases: if the users for example compress hex text files, the code to extract it, even is split in a fairly GC friendly by splitting text into lines, then splitting text into tokens, and then using Java's string to hex formatting it would run very slow, in around 7 seconds. Another interesting code, was that instead of splitting strings per row, it can be written most of the code, even with plain text with zero allocations on pixel drawing (or close to zero, there are allocations for image itself, or the integer array, and so on, but not on the processing of small pixels) and with this you can get into 4.5 seconds range.

At last, you see, .Net was really very slow, really? Yes and no, in this code Java was faster for many reasons, the simplest of them being that Java optimizer is more advanced that the .Net optimizer, so on highly tight loops when extracting code, Java was shining. Also, the code with zero memory allocation for example, or the one with binary processing, I was using lambdas knowing that Java 8 took into account the idea to optimize this code.

Could it be reduced the time to be less than 1.7 seconds? Yes, but not by much, excluding, and here is the main part: that Heroes 2 has a 256 color palette. Reducing the full bitmap data into a palette code, would reduce the 250 MB of data to around 85 MB of data, this meaning that extracting would require around 1/3 of time, and similarly, the uncompressing of data, would be also very friendly to memory allocator. I would expect that extracting of 85 MB of data compressed (which would be very likely under 10 MB mark) would take maybe a bit less than one second.

So what I've learn myself and also people curious how to improve their application performance:
- if they have bigger projects/documents that their application should load: compact the data into binary. If you need to save plain text data, save plain text data, and make a binary copy of it in a temporary folder and a hashed key file to make sure the original file is not modified. If is not modified, load the cached binary file.
- use Java over .Net if you want to have very big batch processing
- reduce memory allocation/reallocation. This can reduce even for text based format to just 2.5 times slower than the full binary format. 

No comments:

Post a Comment