Collections, Sets, Lists, and the Deep Existential Question of Equality March 28, 2006 | 07:05 am

The Java API is badly in need of editting.

I just bumped into Collection.equals(Object) method, which contains this gem:

Thus, a custom equals method for a collection class that implements neither the List nor Set interface must return false when this collection is compared to any list or set. (By the same logic, it is not possible to write a class that correctly implements both the Set and List interfaces.)

So a list that guaranties uniqueness would have to be a Set or a List, but absolutely cannot be both. At least, not without violating the API.

The intuitive API to me is to have the Collection interface require that the Collection.equals(Object) method to return true iff the argument is a collection whose Iterator returns the same elements in the same order as this.iterator(). This would mean that collections with different add/remove behavior could be equals to eachother, but that’s already true — a synchronized ArrayList is equals to a unsynchronized ArrayList is equals to a typed List is equals to a LinkedList by definition, as long as they return the same elements in the same order.

The real fun is that the note in the Collections class is actually wrong. If you look at Set.equals(Object), it requires that the parameter is equals to this iff they are both sets, and if the parameter has the same size as this and if all the elements of this are contained in the parameter. If you look at List.equals(Object), this is equals to the parameter iff they are both lists, and if they return the same elements in the same order. I see nothing in these definitions that says you can’t satisfy both of them at the same time.

  • bhurt-aw

    I have come to the conclusion that equality testing can not be a member function, i.e. it can not be an operation that the class itself provides. That Java’s mistake was to try to provide Object.equals() as a method.

    Here’s the problem with saying that two collections of objects are equal if they hold the same elements. What happens if one collection holds duplicates of the same element? This is possible with lists but not with sets. So we have a set that holds one (and only one) copy of element x, and a list that holds two copies of element x- other than that, they are identical to each other. Are these two collections the same? What happens if I make the set a list that just happens to contain only one x, and not two?

    And then there’s the performance problem. Testing for membership on an array or a list is O(N), which means testing for set equality based on membership can be O(N2). Your iterator definition at least has the advantage of being O(N).

    And then there is the implied behavior. If I have a set and a list which are equal (currently), and add element x to both, can I rely on them to still be equal? The answer, in this case, is no- x could already be a member of the set (in which case adding a second copy wouldn’t change the set, but would change the list), or the list could add x in such a way as to break the ordering requirements- x could show up in a strange location compared to the set. I could argue that this means that a set and a list can never be equal, because equal operations on both may result in them becoming not equal.

    When there are multiple different, legitimate, tactics the user of the class might want, I can flat out gaurentee that no matter which one you pick, you’ll pick the wrong one. If the best choice is right only 40% of the time, and all the other choices are right less than that, that still means that picking the best choice is wrong most of the time.

    At this point any choice is the wrong one- and the only choice is not to choose. There is no one comparison operation which is right often enough that it can be called the comparison operation. Which is why the class- which can only supply one comparison operator- should not supply any comparison operator. Make the user define what they mean by equal. That way it’s right 100% of the time, and not wrong most of the time.

    In other words, don’t use equals() or compareTo(), use a Comparator.

  • whoever

    The equal iff both iterators return .equal() values is wrong because some collections don’t guarantee any iteration ordering. In fact, some tree-based collections with path shortening may even return different orderings without any external modification, so such collections wouldn’t even .equal() themselves!

    Another thing to notice about Set.equals() is that the API says implementing classes should return true if compared against any other class that also implements Set. Same for Lists. The Set and List interfaces impose additional implementation constraints, which, if contradictory, would therefore make the comment in Collection.equals() sensible.

    However what I don’t understand is that if you ignore duplicate elements List.equals() seems to be a logical subset of Set.equals(). We can safely ignore these elements because any hypothetical class that implements Set cannot contain duplicate elements. This brings us back to the original question. What implementation details make these definitions mutually excusive?

  • whoever

    oh duh. I got it now. because Set.equals() requires you to return true when compared against any Set with the right elements, and List.equals() requires you to return true when compared against any List with the right elements, either no class can ever exist that implements only one, or no class can exist that implements both.

  • whoever
    public abstract class ListSet implements List, Set {
            @Override
            public boolean equals(Object o) {
                if (o == this)
                    return true;
                if (o instanceof Collection) {
                    if (this.size() != ((Collection)o).size())
                        return false;
                    if (o instanceof List) {
                        List l = (List)o;
                        for (int i = 0; i < size(); i++)
                            if ((l.get(i) == null ^ get(i) == null) || !l.get(i).equals(get(i)))
                                    return false;
                        return true;
                    } else if (o instanceof Set) {
                        Set s = (Set)o;
                        return containsAll(s);
                    }
                }
                return false;
            }
        }