Blog-Archiv

Sonntag, 4. Dezember 2016

Java TreeMap Comparator Gotcha

You use the Java Comparator interface when you need a certain sort-order in some container. But Comparator does not only declare compare(o1, o2), it implicitly also declares equality of o1 and o2, which is the case when the comparator returns zero (0).

This will not affect you as long as you don't reuse your comparators with different kinds of containers like ArrayList or TreeMap. But when, you may experience the TreeMap gotcha:

Example Source


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
import java.util.Comparator;
import java.util.TreeMap;

public class TreeMapComparatorTest
{
    public static void main(String[] args) throws Exception {
        final String[] toSort = {
                "start",
                "end",
                "employee.name",
                "job.name",
        };

        final Comparator<String> comparator = new Comparator<String>() {
            @Override
            public int compare(String s1, String s2) {
                return removeAllPrefixes(s1).compareTo(removeAllPrefixes(s2));
            }

            private String removeAllPrefixes(String s) {
                final int i = s.lastIndexOf(".");
                return (i >= 0) ? s.substring(i + 1) : s;
            }
        };

        final TreeMap treeMap = new TreeMap(comparator);

        for (String s : toSort)
            treeMap.put(s, "dummy-value");

        if (treeMap.size() != toSort.length)
            throw new IllegalStateException(
                    "Lost some values (" + toSort.length +
                    ") when using TreeMap (" + treeMap.size() + ") !");
    }
}

What I do here is sort some strings which represent property-names. But I want the "xxx.name" properties to stay together, which would not be the case when using an alphabetical sort order, because then "end" would go between "employee.name" and "job.name". Following is the order I want to achieve (whereby the order of "xxx.name" properties does not matter):

  • "end"
  • "employee.name"
  • "job.name"
  • "start"

So I implemented a Comparator that generates such a sort order. It cuts away anything before the last dot, and only then compares the two elements. And it actually works when used with an ArrayList!

But not with TreeMap. When you try out the source code above, you'll see that it throws the IllegalStateException:

Exception in thread "main" java.lang.IllegalStateException: Lost some values (4) when using TreeMap (3) !
 at TreeMapComparatorTest.main(TreeMapComparatorTest.java:32)

When you output the keys of the map, you'll see that one of the "name" properties is missing:

  • "end"
  • "employee.name"
  • "start"

What's happening here?

Responsibilities

I had in mind that comparators are responsible for sorting. That's true, but not the whole truth.

Let's look at the JavaDoc of the compare(T o1, T o2) method of interface Comparator<T>:

Compares its two arguments for order. Returns a negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second.

In the foregoing description, the notation sgn(expression) designates the mathematical signum function, which is defined to return one of -1, 0, or 1 according to whether the value of expression is negative, zero or positive.

The implementor must ensure that sgn(compare(x, y)) == -sgn(compare(y, x)) for all x and y. (This implies that compare(x, y) must throw an exception if and only if compare(y, x) throws an exception.)

The implementor must also ensure that the relation is transitive: ((compare(x, y) > 0) && (compare(y, z) > 0)) implies compare(x, z) > 0.

Finally, the implementor must ensure that compare(x, y) == 0 implies that sgn(compare(x, z)) == sgn(compare(y, z)) for all z.

It is generally the case, but not strictly required that (compare(x, y) == 0) == (x.equals(y)). Generally speaking, any comparator that violates this condition should clearly indicate this fact. The recommended language is "Note: this comparator imposes orderings that are inconsistent with equals."

That's how responsibilities are defined by scientists. Maybe you did as me and stopped reading that complicated text at about 50%. Then you missed the important part that partially explains what happens in the TreeMap gotcha.

It is generally the case, but not strictly required that (compare(x, y) == 0) == (x.equals(y)).

Consequence

What does that mean? You'll find clarity only when you debug TreeMap.put(). Here is an excerpt of its source code:

                int cmp = cpr.compare(key, t.key);
                if (cmp < 0)
                    t = t.left;
                else if (cmp > 0)
                    t = t.right;
                else
                    return t.setValue(value);

You see that this implementation uses the comparator (variable cpr) to detect equal keys. A Map does not allow duplicate keys. Thus it replaces the old value of the key by the new one. And - as a side effect - you lost one element!

As recommended above, you should have added some JavaDoc to your comparator:

"Note: this comparator imposes orderings that are inconsistent with equals."

But would this really have helped :-?

Responsibility Meets Context

You can find a discussion of this gotcha also on stackoverflow. As I said, it will happen when you use Comparator implementations in different contexts:

  • in the context of an ArrayList that allows duplicates, the comparator will work,
  • but in the a context of TreeMap that allows no duplicates, it might not work.

This "might" is the stuff that unreproducible bugs are made of.

Solution

Following is a workaround:

        final Comparator<String> comparator = new Comparator<String>() {
            @Override
            public int compare(String s1, String s2) {
                final int result = removeAllPrefixes(s1).compareTo(removeAllPrefixes(s2));
                return (result == 0 && s1.equals(s2) == false) ? 1 : result;
            }

            .....
        };

This implementation of compare() avoids to return zero when the given arguments are not equal. Thus "job.name" would not be considered to be equal to "employee.name", and not be removed as duplicate. Nevertheless the sort order is sufficient now:

  • "end"
  • "employee.name"
  • "job.name"
  • "start"



Keine Kommentare: