Friday, December 11, 2015

Finding non-duplicates in an array (using Java)

I had a job interview and I will do a lot more Java stuff in day-to-day coding (yay!) and one part of the job interview was about searching non-duplicated values from an array. There is a technical solution (which I would not divulge) which is with O(n) complexity (meaning that the maximum it should scale up linearly with the size of the array) but can it run faster?

Faster than O(n) complexity wise is kinda impossible, because you have to traverse the once the array of data. But as we've seen in an older post, if you can get the constant down, you can get a faster algorithm. So, what about searching it naively the chunk of the data and put it into an int array?

This is not my best benchmark (and I didn't try all combinations) but up-to 100, an native int[] (int array) structure would run faster than using the O(n) complexity.

The code that would search an index (and it will return Integer.MIN_VALUE if no value is not duplicated) is the following:


    public static int GetFirstNonRepeatingIntArray(int[] values) {
        if (values == null) {
            throw new NullPointerException("values");
        }
        int[] counters = new int[values.length * 2];
        int counterSize = 0;
        for (int index = 0; index < values.length; index++) {
            int toFind = values[index];
            int foundIndexValue = getIndexOf(counters, counterSize, toFind);
            if (foundIndexValue == -1) {
                counters[counterSize * 2] = toFind;
                counters[counterSize * 2 + 1] = 1;
                counterSize++;
            } else {
                counters[foundIndexValue * 2 + 1]++;
            }
        }

        for (int index = 0; index < counterSize; index++) {
            if (counters[index * 2 + 1] == 1) {
                return counters[index * 2];
            }
        }
        return Integer.MIN_VALUE;
    }

    public static int getIndexOf(int[] counters, int counterSize, int toFind) {
        for (int foundIndex = 0; foundIndex < counterSize; foundIndex++) {
            if (counters[foundIndex * 2] == toFind) {
                return foundIndex;
            }
        }
        return -1;
    }

For example for 10,000 repetitions of the algorithm if arrays are of 50 items (which are randomly generated in a range of 1 to 25) it would give the following output:
Total execution time: 100ms
Total IntArray execution time: 31ms

Does it have any implication in your day to day life this kind of coding? I would say yes: when you work typically with "Settings" kind of classes, you will better work using arrays/Lists than dictionaries, even it is counter-intuitive: there is very unlikely you will get hundreds of settings, but both as debugging experience and performance may be better. Memory usage (especially compared with Java's HashMap implementation) is also much better.

No comments:

Post a Comment