Commons-Lang and the equals()/compareTo() Debacle January 30, 2006 | 10:22 am

Like most traumatic realizations, I’ve been having trouble getting the fundamental equals/compareTo brokeness of Java out of my head.

I decided to take a look at the Commons-Lang library to see how they deal with it. They’ve got a class called EqualsBuilder which is supposed to take care of this stuff for you. So I was wondering how it did that.

It gives you two options:

  1. Reflection
  2. Manual

The manual process has all the problems that I’ve talked about here and here, so we’re going to punt on that.

For the manual process, if that was the way it was decided to go, my deisgn would be a bit different. First and foremost, the isEquals private member variable should really be a constant.

I would go a step further and make there be only two instances of an EqualsBuilder: the one where it’s not false yet, and the one where it’s been determined to be false. These two could be implemented as private inner classes implementing the common abstract base class, so the one where it’s been determined to be false literally does no work and the one where it’s not false yet contains all the tricky thinking code. That’d save some branching and a lot of object burn. Of course, that’s arranging the deck chairs on the Titanic, so let’s not spend any more time there.

The reflection option has much more promise. It (in its most precise form) accepts the following information:

  1. Two objects to compare
  2. Whether or not to test transients
  3. Class to test up to

I’m not entirely sure about the value of the class to test up to, because the scenarios where that’s anything other than “Object” seem to be making some really strong assumptions about the internal behavior of the code. Similarly, I’d be prone to always have transients tests turned off, but I’m pretty morally opposed to transients as a general statement.

Once the two objects are passed in, it calls one the “left” object and one the “right” object. Then here are the steps it takes:

       
  1. If the two sides are the same object, return true.
  2.    

  3. If either side is null, return false. (Personally, I’d consider null to be equal to null.)
  4.    

  5. Determine the class to use to test as follows.
          
               
    1. If the left object is an instance of the right object’s class, but the right object is not an instance of the left object’s class, then use the right object’s class. This is the scenario where the right hand object is an instance of a child class of the left hand object’s class.
    2.          

    3. If the right object is an instance of the left object’s class, but the left object is not an instance of the right object’s class, then use the left object’s class. This is the scenario where the left hand object is an instance of a child class of the right hand object’s class.
    4.          

    5. If neither object’s class is an instance of the other, then return false. This is the case where two classes are different — although they may share a common ancestor.
    6.          

    7. Else, arbitrarily use the left object’s class, since they are equal.
    8.       

       

  6. Now, for the class to test and all of its super classes, determine all the fields of that class through reflection and test them.

This is better, but there are still some concerns.

First, there are a lot of technical difficulties inherent in reflection. Private classes are notoriously picky, and the handling of them in the various reflection-based methods are pretty inconsistant. To fix this, the Commons-Lang code sets the accessability of the fields to true, which is a start, but the maintenance on all of this is kind of ugly. There is also significant overhead to reflection, which you’re now hitting on every time you try to walk a list, insert into a tree, etc., etc. — some caching might be able to help this, as long as the number of different classes you’re encountering within a window stays reasonably small.

There’s another couple of issues which I would probably go so far as to call “bugs”.

First, variables belonging to inner classes aren’t checked. Inner classes which implement interfaces and contain data are exceedingly common — the most common examples are the Map.Entry and Iterator interfaces. I also have many times when I will have an inner class which provides the “doIt”-ness of the class, while the class itself does clean-up and maintenance on that “doIt”-ness. Similarly, anonymous inner classes that inline an object reference or value (see “closures”) could generate false positives, because the “value equality” and “functional equality” of those classes are drastically different.

Second, two classes that are sibling classes but value-equivalent cannot have their equality properly determined through this method — in fact, they’ll always be false. If two classes both implement a base abstract class, and are to be considered equal in some cases, then we’re in trouble. Think about the Number class and its various descendents — in this case, a Float with a value of 1 and an Integer with the value of 1 would be identified as not being equal. This case is so weird and identifying when it’s supposed to be equal and when it’s not is so hard that I would generally punt back to the user to have to address this case. I’m not sure how the user isn’t going to handle it, but I don’t fault the Commons-Lang developers for leaving this bug in play.

That said, it’s easier to use the reflection mode in the EqualsBuilder class than to try to actually generate a sensical work-around for the equals() problem. It’s not a perfect solution, but it’s a reasonable hack that sacrifices performance for correctness…at least, correctness in more cases.

Tags: ,

  • TheHawk

    I think the fundamental problem with the whole equals/compareTo mess is that it trys to handle everything. You end up having to handle a number of relatively nonsensical situations which in most cases will never come up. How many times are you actually going to hit the situation where you’re asking “Is this Orange equal to the number 5?” CompareTo is even worse for nonsensical arrangements. In reality, any object really only has a certain set of other objects where comparison makes sense. Unfortunately, there’s no good way to implement this, especially because that set can change without the objects knowledge.

  • TheHawk

    I think my “standard” default solution is to have the equals try to cast the other object into its own class and compare then. If it can’t, merely return false. Actually, I’d extend the “cast” portion of that past merely the casting itself and also use reflection on the other object to try and see if the other object has a method to turns itself into an object of the other type. I think this should work for “most” situations, although you can still run into situations where you lose the reflexiveness. The problem is your dealing with multiple frames of comparison, and two objects that are equla in one, may not be equal in the other. I.E. 2 is equal to 2.343 as Integers, but not as Floats. Admittedly, in these situations to “cast” up or down you are either losing information or making stuff up which is whats skewing the equality.

  • http://enfranchisedmind.com Candide

    Actually, it’s not that ridiculous at all. Your question should return false naturally — any Orange and the number 5 are different. Beyond which, Java requires you to handle such a situation sensically because . But consider different implementations of the Number class, or different implementations of the Map.Entry interface — it’s natural that I might want to collect a Set or Map of them. If I do that, however, I need their equals(Object) method to work, because there’s no way to provide a Comparator to most Set implementations — and even the ones that do could drop to using equals(Object) without warning.

    The fundamental problem is that it makes no sense for “equals” to be defined as an instance method, because implicit to the question of “equals” is “equals in terms of what”?

    Again, consider the 2D and 3D point example, which is literally a textbook example of polymorphism and reuse — the 2D point, however, will hapilly return “true” if it is asked if it is equal to the 3D point. The 3D point, being aware of more state, will return “false” if it asked if it is equal to the 2D point. This example trivially shows that one of the fundamental guarantees requried of the equals(Object) method — that a.equals(b) == b.equals(a) is true — cannot be defined in a simple way when polymorphism is in effect. You need to use the EqualsBuilder.reflectionEquals(this, that) style of coding, but then you’re carrying forward those issues we saw before.

    And you still don’t have a good case where two classes both implement an abstract base class or interface. For instance, it would be reasonable to see code where you have maps of maps (any Perl-turned-Java programmer has written thos code). If you accept abritrary maps, however, it’s very possible that the a.equals(b) == b.equals(a) requirement could b violated given different implementations of a and b — for one thing, any Map relying on EqualsBuilder would fail when presented with a non-related implementation of Map.

    All of which throws the problem back into the user’s lap.

    The fundamental problem isn’t that nonsensical things are being compared — it’s that whether two things are “sensical” for being compared is an abstract phrase. The problem is that Java isn’t functional enough to realize that it needs to be able to define “Equals in terms of…” for basically any data structure.

    s/equals/compare to/g for another fun argument.

  • http://enfranchisedmind.com Candide

    Didn’t see your second post.

    Your original solution — which has been (and will probably continue to be) my own default solution — was what I was working with when I first encountered this problem. It’s the standard solution I’ve seen kicking around in Dietel/Deitel/whatever and other places. I just looked into my “Java Cookbook”, and it’s the solution they model right there (p226).

    The problem I encountered was simplified to the 2D/3D case. That case illustrates the asymmetric return values that pop up even in a trivial case of polymorphism, resulting in arbitrary behavior by the Set class.

    So far, my only conclusion has been that one should never try to use equals(Object) if the objects aren’t of precisely the same class. This means that you can’t have Map keys or Set values made up of different classes, and being the same “abstract class” or implementation the same interface aren’t good enough.

  • bhurt-aw

    The worst-case problem is when you have a base class, and two different derived classes, and you want to compare the two different derived classes. Even ignoring problems of the two different derived classes being in different packages, and wanting to compare private or package visible members, we still have a hard problem.

    I think that, at the end of the day, comparison can not be a member function. Certainly in closed languages like Java, C++, and Ocaml, where the entire implementation of the class has to be present when initially declared, the information needed to write a comparison class simply isn’t available when you’re writting any given class. Note that both derived classes need to know about the other derived class, because you can’t tell if equals will be called as derived1.compareTo(derived2), or derived2.compareTo(derived1).

    But even in open languages like Ruby and Smalltalk, which allow you to add member functions to a class later, I still don’t think compare can be a member function, because you need too much flexibility. Say we have a base class and three derived classes, and want to have three different data structures- one holding derived classes one and two, one holding derived classes one and three, and one holding derived classes two and three. Now you need three different comparison routines, each copied into two different classes!

  • http://enfranchisedmind.com/blog Candide

    I actually thought of a solution for this in Ruby. It’s horrible and ugly, but it works.

    Whenever you load a class that could be equal to other sibling classes, re-define the comparison function by wrapping the old comparison function in a new function definition that handles the sibling cases, and calls the old function for all other cases.

    I told you it was horrible and ugly.

  • Pingback: The Cheap Sitcom Clip Scene Blog Post | Enfranchised Mind