Posts Tagged ‘object oriented programming’
Java Posse, .Equals(), Inheritance, and the Liskov Substitution Principle October 29, 2007 | 06:38 am

Continuing the conversation had by Raganwald, me, and a lot of the rest of the blogosphere:

The Java Posse Responds to My #Equals Issue
(If you don’t know the Java Posse, check out their website/blog/podcast. It’s good stuff.)

The answer they gave was an interesting one, which I’m still sitting on and working in my head. Here is Martin OConnor’s response, for those too lazy to read the (very interesting) thread:

Your class hierarchy is flawed. ThreeDeePoint does not naturally extend TwoDeePoint, because a Three Dimiensional point is not a kind of Two Dimensional point. They represent two very distinct concepts. Essentially, when you extend TwoDeePoint with ThreeDeePoint, you are saying that ThreeDeePoint is a special kind of TwoDeePoint. I would submit, therefore that the problems you are observing are less due to the impelmentation of Object.equals(), and more as a result of your chosen class hierarchy design.
[...]
What I was trying to explain, (although poorly), or more precisely hinting at, is The Liskov Substitution Principle. This is an Object Oriented design principle that states that any reference to a base class should be replaceable with a reference to a derived class without affecting functionality. What my previous post had attempted to say is that since the TwoDeePoint and ThreeDeePoint classes do not adhere to The Liskov Substitution Principle, the design, in this particular case is flawed.

(For those that don’t remember my “TwoDeePoint” and “ThreeDeePoint” example, check my post, “A Java Gotcha”, where I first get into the #equals/#compareTo issue.)

So, is it invalid for a 3DPoint to inherit from a 2DPoint? The general way this is phrased is “is it legitimate to say 3DPoint is-a 2DPoint”? Well, that depends on what your definition of “is-a” is.

After all, the semantics behind a 3DPoint imply it is a 2DPoint, as it has all the same qualities of a 2DPoint: it simply adds the idea of a third dimension to the existing definition. Semantically, a three dimensional point is a two dimensional point residing on a particular point in the three-dimensional axis, and it has all the capabilities of a two dimensional point on that particular plane.

On the other hand, the Liskov Substitution Principle gives a much stricter definition of “is-a”. I’d like to look at a colloquial definition of the principle, and then a formal definition, because there is a huge gulf between these things.

Colloquially, it basically says that Foo should not inherit from Bar unless Foo is-a-specialization-of Bar: that is, Foo is a limitation upon Bar. So the inheritance in that example is invalid: 3DPoint is not taking 2DPoint and limiting it, but it’s taking 2DPoint and extending it.

Adopting this approach to inheritance will substantially limit its use, and is going to lead to some repetitive code — in my example, all the 2DPoint methods would have to be re-implemented as 3DPoint methods. But it does side-step the #equals/#compareTo problem, in that a specialization will have to rely on the same #equals/#compareTo definition as its parent class.

And that’s where the rub is. Using the cite by way of Wikipedia, let’s look at the formal definition of this principle:

Let q(x) be a property provable about objects x of type T. Then q(y) should be true for objects y of type S where S is a subtype of T.

My rub is with the phrase “a property provable about”. Consider a type of a matrix (M) and a diagonally symmetrical matrix (S). Specifically, there exists M#setXY that sets the item at [X,Y] to a vaue. In S#setXY, we override it so that it also sets the item at [Y,X] to the same value as [X,Y] was set to (thereby maintaining symmetry). Now, S is a colloquial specialization of M, but it could be a violation of the substution principle, since it changes the postcondition of M#setXY: is the fact that the rest of the matrix remains unchanged in M#setXY constitute “a provable property”?

Assuming that “a provable property” is basically analogous to the precondition/postcondition properties, I suppose there is no reason that M#setXY has to specify that the rest of the matrix remains unchanged. What we would basically be arguing in this case is a very liberal interpretation of the role of preconditions/postconditions: the postconditions of M#setXY need not specify everything that M#setXY does, and the preconditions of M#setXY might not need be all true.

If this is the case, then calling a method becomes a very, very scary thing. It is now legitimate for me to implement M#setXY to set every item in the matrix to NaN, delete your hard drive, send out 3000 spam mails a second, and make some coffee. As long as M#setXY ends by setting [X,Y] to the given value, it is an acceptable implementation of inheritance.

So, if we deny this liberal interpretation, and then S is not a legitimate case for inheritance of M, because S#setXY does something which M#setXY doesn’t specify. So the specialization we’re talking about by way of the substitution princple is extremely limiting: basically, all you can do is throw a different sub-class of already-declared exceptions, return subclasses of already-declared return values, and add methods which do not intersect/interefere with the provable properties of any other method, including #equals/#compareTo.

This limits inheritance to very few cases. Certainly, the implementations of abstract class are allowed, assuming that the abstract class leaves enough breathing room (e.g. vague enough declared exceptions) for an underlying implementation. This can be seen, for instance, in Java’s AbstractCollection. Anything which tacks on a piece of non-identifying information or non-interfering functionality works just fine, too, although I’m having trouble coming up with a good example for either of those.

Some people older and wiser than me once asserted that no inheritance should be done on anything other than classes explicitly defined for inheritance (i.e. abstract classes). At the time, I thought that was extreme, and so I discounted the position. I’m increasingly starting to understand it, though, and I think I may be signing on.

Commons-Lang and the equals()/compareTo() Debacle January 30, 2006 | 10:22 am

Like most traumatic realizations, I’ve been having trouble getting the fundamental equals/compareTo brokeness of Java out of my head.

I decided to take a look at the Commons-Lang library to see how they deal with it. They’ve got a class called EqualsBuilder which is supposed to take care of this stuff for you. So I was wondering how it did that.

It gives you two options:

  1. Reflection
  2. Manual

The manual process has all the problems that I’ve talked about here and here, so we’re going to punt on that.

For the manual process, if that was the way it was decided to go, my deisgn would be a bit different. First and foremost, the isEquals private member variable should really be a constant.

I would go a step further and make there be only two instances of an EqualsBuilder: the one where it’s not false yet, and the one where it’s been determined to be false. These two could be implemented as private inner classes implementing the common abstract base class, so the one where it’s been determined to be false literally does no work and the one where it’s not false yet contains all the tricky thinking code. That’d save some branching and a lot of object burn. Of course, that’s arranging the deck chairs on the Titanic, so let’s not spend any more time there.

The reflection option has much more promise. It (in its most precise form) accepts the following information:

  1. Two objects to compare
  2. Whether or not to test transients
  3. Class to test up to

I’m not entirely sure about the value of the class to test up to, because the scenarios where that’s anything other than “Object” seem to be making some really strong assumptions about the internal behavior of the code. Similarly, I’d be prone to always have transients tests turned off, but I’m pretty morally opposed to transients as a general statement.

Once the two objects are passed in, it calls one the “left” object and one the “right” object. Then here are the steps it takes:

       
  1. If the two sides are the same object, return true.
  2.    

  3. If either side is null, return false. (Personally, I’d consider null to be equal to null.)
  4.    

  5. Determine the class to use to test as follows.
          
               
    1. If the left object is an instance of the right object’s class, but the right object is not an instance of the left object’s class, then use the right object’s class. This is the scenario where the right hand object is an instance of a child class of the left hand object’s class.
    2.          

    3. If the right object is an instance of the left object’s class, but the left object is not an instance of the right object’s class, then use the left object’s class. This is the scenario where the left hand object is an instance of a child class of the right hand object’s class.
    4.          

    5. If neither object’s class is an instance of the other, then return false. This is the case where two classes are different — although they may share a common ancestor.
    6.          

    7. Else, arbitrarily use the left object’s class, since they are equal.
    8.       

       

  6. Now, for the class to test and all of its super classes, determine all the fields of that class through reflection and test them.

This is better, but there are still some concerns.

First, there are a lot of technical difficulties inherent in reflection. Private classes are notoriously picky, and the handling of them in the various reflection-based methods are pretty inconsistant. To fix this, the Commons-Lang code sets the accessability of the fields to true, which is a start, but the maintenance on all of this is kind of ugly. There is also significant overhead to reflection, which you’re now hitting on every time you try to walk a list, insert into a tree, etc., etc. — some caching might be able to help this, as long as the number of different classes you’re encountering within a window stays reasonably small.

There’s another couple of issues which I would probably go so far as to call “bugs”.

First, variables belonging to inner classes aren’t checked. Inner classes which implement interfaces and contain data are exceedingly common — the most common examples are the Map.Entry and Iterator interfaces. I also have many times when I will have an inner class which provides the “doIt”-ness of the class, while the class itself does clean-up and maintenance on that “doIt”-ness. Similarly, anonymous inner classes that inline an object reference or value (see “closures”) could generate false positives, because the “value equality” and “functional equality” of those classes are drastically different.

Second, two classes that are sibling classes but value-equivalent cannot have their equality properly determined through this method — in fact, they’ll always be false. If two classes both implement a base abstract class, and are to be considered equal in some cases, then we’re in trouble. Think about the Number class and its various descendents — in this case, a Float with a value of 1 and an Integer with the value of 1 would be identified as not being equal. This case is so weird and identifying when it’s supposed to be equal and when it’s not is so hard that I would generally punt back to the user to have to address this case. I’m not sure how the user isn’t going to handle it, but I don’t fault the Commons-Lang developers for leaving this bug in play.

That said, it’s easier to use the reflection mode in the EqualsBuilder class than to try to actually generate a sensical work-around for the equals() problem. It’s not a perfect solution, but it’s a reasonable hack that sacrifices performance for correctness…at least, correctness in more cases.

A Java Gotcha October 28, 2005 | 10:31 am

Java has a universal “equals” method, as well as a “Comparable” interface that defines “natural comparison”. I always had a distaste for that design decision, and that distaste has just been further validated.

So, I have a class which looks like this:

     public class TwoDeePoint implements Comparable {
          public int x = 0;
          public int y = 0;
 
          public boolean equals(final Object obj)  {
                    if(obj == null || !(obj instanceof TwoDeePoint)) {
                              return false;
                    } else {
                              TwoDeePoint them = (TwoDeePoint)obj;
                              return them.x == this.x && them.y == this.y;
                    }
          }
 
          public int compareTo(final Object o) {
                    if(o == null) {
                              throw new NullPointerException();
                    } else if(!(o instanceof TwoDeePoint)) {
                              throw new ClassCastException();
                    } else {
                              final TwoDeePoint them = (TwoDeePoint)o;
                              final int xCmp = new Integer(this.x).compareTo(new Integer(them.x));
                              final int yCmp = new Integer(this.y).compareTo(new Integer(them.y));
                              return xCmp != 0 ? xCmp : yCmp;
                    }
          }
     }

Even ignoring certain religious/performance fights (like whether the trinary operator is evil, why I declare so much crap “final” and whether the “x” member variable should be “X“, or even visible at all…), there is a major error to this code. Specifically, consider this class:

     public class ThreeDeePoint extends TwoDeePoint implements Comparable {
          public int z = 0;
 
          public boolean equals(final Object obj)  {
                    if(obj == null || !(obj instanceof ThreeDeePoint)) {
                              return false;
                    } else {
                              final ThreeDeePoint them = (ThreeDeePoint)obj;
                              return super.equals(obj) && them.z == this.z;
                    }
          }
 
          public int compareTo(final Object o) {
                    if(o == null) {
                              throw new NullPointerException();
                    } else if(!(o instanceof ThreeDeePoint)) {
                              throw new ClassCastException();
                    } else {
                              final ThreeDeePoint them = (ThreeDeePoint)o;
                              final int supCmp = super.compareTo(them);
                              final int zCmp = new Integer(this.z).compareTo(new Integer(them.z));
                              return supCmp != 0 ? supCmp : zCmp;
                    }
          }
     }

See the problem yet? Guess the result of the following snippet of code:

          final TwoDeePoint a = new TwoDeePoint();
          a.x = 1;
          a.y = 2;
          final ThreeDeePoint b = new ThreeDeePoint();
          b.x = a.x;
          b.y = a.y;
          c.z = 3;
          System.out.println(a.equals(b));
          System.out.println(b.equals(a));

The answer is that the snippet will print out:

          true
          false

And there’s the problem: the equality operator, which is designed into all classes and is so critical to the operation of so many standard collections, has a very common and glaring issue. Given this implementation, a Set of TwoDeePoint and ThreeDeePoint objects may or may not insert a new ThreeDeePoint if its x and y are the same as an existant TwoDeePoint, and the deciding issue which equals method the programmer decided to call. Since the equals method is reflexive by definition, it shouldn’t make a difference…but it does.

The kludge that all Java programmers need to burn permenently into their brain can be modelled after the following change to the TwoDeePoint class:

          public boolean equals(final Object obj)  {
                    if(obj == null || !(obj instanceof TwoDeePoint)) {
                              return false;
                    } else {
                              TwoDeePoint them = (TwoDeePoint)obj;
                              if(them.x == this.x && them.y == this.y) {
                                        if(them.getClass() == this.getClass()) {
                                                  return true;
                                        } else {
                                                  return them.equals(this);
                                        }
                              }
                    }
          }

At the point when we do the class check, we know that they are an instance of our class. If they are not precisely our class, then they are an instance of a child class, and we should make sure that they are equal to us. A similar change also needs to be made to compareTo.

Of course, if I’ve got a Collection of arbitrary TwoDeePoint objects, I now know that two of them may not be equal even though their x and y values are equal, so if I only care about the x and y values, I need to write my own method to compare them which is external to the TwoDeePoint class, which really begs the question of whether that equals method is worth anything at all…

EDIT: I just realized that my code has a possible infinite recursion case. Specifically, if two ThreeDeePoint objects compare to eachother, they’ll end up spinning infinitely into the abyss. Crap…I don’t know if there is a solution to this problem without breaking its object-oriented nature (e.g.: explicitly checking x and y in ThreeDeePoint)…