Saturday, June 25, 2016

Performance in Java vs .Net - 2016 edition

The motivation if writing this blog entry started when I commented about my experience with Java and performance and as being published in a .Net forum (in LinkedIn), some people thought that I missed some small or not so small points. Please go to Nota Bene section to see this entry's presumptions.

I found that Java is very often faster than .Net, so how can be this possible, even I work for at least 6 years C#/.Net and I have less experience with Java (even I know it well enough)?

Some reasons why I think this is true:
- .Net has a much weaker compiler. For example, if you have two identical functions, the JVM will likely generate better code. I wrote one identical "optimized" version of calculating prime numbers, taking into account that there is no math which Java can auto-paralelize. The algorithm was to calculate for 100 times the 150.000 prime mumber. (I calculated 100 times to have a clear average)

My computer times are:
.Net: 12719 ms
Java: 11184 ms

 Code for reference (copy-paste it in your IDE and change minimally your code for C#: use Environment.TickCount for taking milliseconds):

This is very likely because that Java has a tiered compiler which have a better register allocator. This code is a lot of math and arrays accesses.

- most common runtime classes seem to be better tunned. At work I have a formatter which is used very often. Replacing "String.format" code with a "hard-coded" formatter, would speed up over 50x the Java Library code. I ported this code to C# (with virtually no modification):
.Net: 3703 ms
Java: 1491 ms

The formatter has less code necessarily, but the code has more conditions to optimize, some casts to be removed and so on. But also, even more, adding few common objects as String show that the GC in Java and working with String types is a bit faster.

- Java tends do do more aggressive inlining: by default, at least in what is public from .Net and Java, it looks like .Net considers as candidate for inlining functions up-to 32 CIL instructions (which have very close semantics with Java bytecode), when Java does have the limit of 55 bytecodes. (the second value I found it on a presentation of Java's JIT, and it was the default for Java 8 timeframe, not sure if any of these values can be changed). This of course it means that on a big enough project more opportunities for inlining are at one place

- Java has quicker by default .Net lambdas: this is true for .Net, not true for Mono (as far as the public presentation goes), but in Java all single method interfaces are compatibles with lambda implementations and if there is only one implementation in one context and it is a small method, it is a candidate for inlining.

- Java does have more optimizations which they run when the code is hot enough. The latest revision of .Net JIT does include some more interesting optimizations, like the option to use SIMD, but Java for now it can do it if code is SIMD-able, automatically. This optimization - of course - requires more time to do analysis, but if it is successful, can do wonders in performance. Similarly, small objects which are allocation in a small enough loop and they do "not escape", are not allocated on heap. Escape Analysis I think it is more viable for a large project with many small intermediate objects

- Java has by default a lot of customization of GC by default: you can choose heaps of gigabytes with no GC call. This can make wonders for some class of applications, and if you are aware how much is allocated per day, you can restart it out of the critical time your application making GC to be not involved.

I could talk many cases when I know that Java has some optimization which .Net doesn't have in particular (because of CHA for instance), but the point can be taken.

So, it looks to me, that the more complex, longer running code is concerned, I can get consistently at least 10% speedup in CPU related tasks, so why developers still consider that .Net is quicker than Java?


I have some options which could make sense:
- They don't compare the same things or on different abstraction levels:
If you compare Dapper (or Massive) SQL minimalist ORMs with full blown Entity Framework, you will likely see a huge loss in performance. Similarly, people do write ArrayList<Integer> (which is stored in Java as a list of object) and they compare with List<int> in .Net (which internally keeps raw intergers in a contiguous array). I wrote in fact a minimalist library which reifies some classes named FlatCollections in Java. I don't recommend using them if you don't care this much about performance, but if you do, you may give it a try
- Java starts slower, so it feels slow. This happens because Java runs initially everything in an interpreter, then compiles the hot code. This is also an important thing to take into account. If you compare full blown applications like Java FX one with a WPF one, the differences feel huge. But the startup lag doesn't make an affirmation about performance, otherwise we would write every program today in MS-DOS not Windows/Linux/MacOS that boots in seconds just with an SSD. I made Fx2C OSS project which reduces JavaFX startup lag, if you are into optimizing the startup time.
- feeling that when developers compare platforms, compare different abstraction levels mistakenly over different platforms. This is a really different point than first. Instead of comparing the most lean, close-to-metal "abstraction", some code would use Java's streams using IntStream (this would not create any dangling types) against Linq with Tuple (the Tuple<> types were defined as Class type, generating a lot of heap pressure and GC). This can be also reversed with List<int> (in .Net) vs ArrayList<Integer> (in Java).

Give feedback and I will be glad to answer to all criticism and corrections.

Nota Bene. Some points about myself:
- I am not paid and I wasn't paid by Microsoft, Oracle and so on. In fact, as a full disclosure, I participated to a Microsoft opened hackathon and I won a small prize (a bluetooth speaker) and if I recall right, I was passing by a Microsoft conference and they gave a "stress ball". I have no animosity against Microsoft per-se, excluding (maybe) that I like free software and opensource. I think that as of today Microsoft works very friendly with OSS community, so nothing to claim here
- I also have no interest in Oracle or any Java vendor (including Google) and as it is concerned, I never receive even a plastic ball-pen or anyhing of this sort
- I have opinions and biases but I try to be honest and direct about them
- I know that no comparison can be made without excluding many other components related with that technology. One of the most important as I see is: licensing. If you have a successful company and you want to scale your software, at least Java tools have higher individual license costs, but virtually zero horizontal costs, when in comparison, Microsoft seem to be a smaller cost per developer but with higher costs if you scale up your software. This is a subject which seems to change (like .Net Core) but as far as I understand, is not a finished software
- technologically, I think that C# is better designed as language, similarly it is the CIL bytecode
- I have around 10 years of working in software industry, covering C++, .Net/C# and kind of little Java (as of my current job).

Code for first example:

public class Program {
    boolean isPrime(int value, int[]divisor, int szDivisor){
        for(int i =0;i<szDivisor; i++){
            int div = divisor[i];
            if(div*div>value) {
                return true;
            }
            if(value%div == 0)
                return false;
        }
        return true;
    }

    int nthPrime(int nth){
        if(nth==1){
            return 2;
        }
        int[] foundPrimes = new int[nth];
        int primeCount = 1;
        foundPrimes[0] = 2;
        int prime = 3;
        while (true){
            if(isPrime(prime, foundPrimes, primeCount)){
                foundPrimes[primeCount] = prime;
                primeCount++;
            }

            if(primeCount==nth)
                break;

            prime+=2;

        }
        return foundPrimes[primeCount-1];
    }


    public static void main(String[] args) {
        Program p = new Program();
        p.nthPrime(500);
        long start = System.currentTimeMillis();
        for(int i= 0;i<100;i++)
        System.out.println("The prime number is: "+p.nthPrime(150000));
        long end = System.currentTimeMillis()-start;
        System.out.println("Time: "+end + " ms");
    }
}

Code for 2nd example:
public class TimeFormatter {
    private char[] digits = new char[7];
    private int _cursor;
    public String formattedTime(int currentPackgeTimeStamp) {
        int secondsPassed = currentPackgeTimeStamp / 1000;

        int minutesPassed = secondsPassed / 60;
        int seconds = secondsPassed % 60;
        int decimals = (currentPackgeTimeStamp % 1000) / 100;
        reset();
        push(minutesPassed / 10);
        push(minutesPassed % 10);
        pushChar(':');
        push(seconds / 10);
        push(seconds % 10);
        pushChar('.');
        push(decimals);

        return new String(digits);
    }

    private void reset() {
        _cursor = 0;
    }

    private void push(int i) {
        pushChar((char) ('0' + i));
    }

    private void pushChar(char c) {
        digits[_cursor] = c;
        _cursor++;
    }
 
  public static void main(String[] args) {
    long start = System.currentTimeMillis();
    int iterations = 100_000_000;
    TimeFormatter timeFormatter = new TimeFormatter();

    int sum = 0;
    for (int i = 0; i < iterations; i++) {
        String t = timeFormatter.formattedTime(125400);
        sum += t.charAt(0);
    }

    long end = System.currentTimeMillis();

    long duration = end - start;
    System.out.println("Duration: " + duration + ", sum: " + sum);
}
}

No comments:

Post a Comment