Michael Bolin's 6.170 Recitation

www.bolinfest.com 
This web site is for students in the recitation that I am teaching for 6.170 Fall 2004. The recitation meets Thursday afternoons from 2-3pm in 34-302 and my office hours are after recitation from 3-5pm in 32-044A.
Recitation Notes
9/16 Recitation 2
9/23 Recitation 3
10/7 Recitation 5
10/14 Recitation 6
10/28 Recitation 8
Recommended Readings
Athena Pocket Reference
The Java IAQ
Programming by Coincidence
Writing a Code Generator
in Java
The Story of Mel
A Comparative Overview of C#
Beating the Averages
J2SE 5.0 New Features

10/14 Recitation 6 - Equality

  1. The equals() Contract
  2. Why equals(null) Returns False
  3. Template for Implementing equals()
  4. Overriding equals() in a Subclass
  5. equals() and hashCode(): Best Friends Forever
  6. Spelling Counts!
  7. Creating a Good Hash Code
  8. Equivalence

1. The equals() Contract

The equals() method has the following specified requirements for non-null references:
  • Never equal to null: x.equals(null) is always false.
  • Reflexive: x.equals(x) is always true.
  • Symmetric: If x.equals(y), then it must be that y.equals(x)
  • Transitive: If x.equals(y) and y.equals(z), then it must be that x.equals(z)
  • Consistent: Multiple invocations of x.equals(y) return the same value unless one of the objects is mutated.
Make sure that any equals() method that you write meets this contract.

Make sure that if you override equals(), then you override hashCode() so that for any two objects for which equals() is true that their hashCode() method returns the same int.

Failure to do these two things will lead to insidious bugs.

2. Why equals(null) Returns False

Given that the method signature is:
public boolean equals(Object o)
the invocation equals(null) can do one of the following things:
  1. return true
  2. throw an exception
  3. return false

Option 1: Return true

If equals(null) were to return true, then intuitively, this would not make any sense because a non-null object would be considered equal to something that is not even an object. This is inconsistent with our idea of equality.

Option 2: Throw an Exception

Because equals() is a method of Object, it is a fundamental method in Java that is used often, and in a variety of ways. Thus, it would be inconvenient if it threw an exception when passed a null reference when there is a more reasonable response, namely:

Option 3: Return false

This consistent with our idea of equality because it considers objects and non-objects unequal, so this appears to be the most reasonable result for equals(null).

3. Template for Implementing equals()

Suppose you are overriding equals() in a class called MyClass; use the following as a template for your implementation:
public boolean equals(Object obj) {
  if (!(obj instanceof MyClass)) {
    return false;
  }
  MyClass myClass = (MyClass)obj; // myClass is a non-null reference

  // add the code that is
  // specific to comparing
  // this and myClass here

}
Because (obj instanceof Object) will return false if obj is null, the lines:
  if (!(obj instanceof MyClass)) {
    return false;
  }
will ensure that this implementation of equals() fulfills the part of the contract the requires it to return false when passed a null.

The instanceof keyword is used to test to see if the reference on the left either implements or is a subclass of the type on the right. Performing an instanceof test is a relatively expensive operation in Java, so it should be be avoided whenever possible, as it improves performance and is also better style.

For example, if you had a List of Student and Professor objects, then you would have to do something like this when iterating over the list:

for (Iterator iter = list.iterator(); iter.hasNext(); ) {
  Object obj = iter.next();
  if (obj instanceof Student) {
    Student student = (Student)obj;
    System.out.println("Name: " + student.getStudentName());
  } else if (obj instanceof Professor) {
    Professor professor = (Professor)obj;
    System.out.println("Name: " + professor.getProfessorName());
  }
}
It would be better style (and better performance) to create an interface Person with a method getName() that both Student and Professor implement:
class Student implements Person {
  public String getName() { return getStudentName(); }
}

class Professor implement Person {
  public String getName() { return getProfessorName());
}
Then the previous code could be replaced with:
for (Iterator iter = list.iterator(); iter.hasNext(); ) {
  Person person = (Person)iter.next();
  System.out.println("Name: " + person.getName());
}
This is much more elegant. Unfortunately, a similar refactoring cannot be done with equals() to eliminate the call to instanceof. However, Bloch (in Item 7 of Effective Java) proposes making the first line of equals() the following:
if (obj == this) { return true; }
As this is an inexpensive comparison to perform, this may save a lot of cycles if there is a fair chance that the object will be compared to itself via equals().

You may think that you could solve this problem by doing something like this:

public class Point3D {

  private int x, y, z;

  public Point3D(int x, int y, int z) {
    this.x = x;
    this.y = y;
    this.z = z;
  }

  public boolean equals(Point3D p) {
    if (p == null) return false;
    return this.x == p.x && this.y == p.y && this.z == p.z;
  }

}
Unfortunately, this will not work. For example, if you try to run the following code:
java.util.Set pointSet = new java.util.HashSet();
pointSet.add(new Point3D(1, 2, 3));
pointSet.add(new Point3D(1, 2, 3));
System.out.println(pointSet.size());
the number 2 will be printed to the console even though it seems that addition of the second point should be ignored, as it is a duplicate. The catch is that Set tests for equality by invoking the equals(Object) method. Since Point3D does not override equals(Object), it uses its default implementation:
public boolean equals(Object obj) {
  return obj == this;
}
Thus, the equals(Point3D) method is ignored, and unused, by Set. This method will only be used when the compiler can match the types of the parameters with the method. Examine the results of the following equals() comparisions for a better understanding of how this works:
Point3D a = new Point3D(3, 14, 159);
Point3D b = new Point3D(3, 14, 159);
Object c = a;
Object d = b;

a.equals(b);          // true
a.equals(c);          // true, uses equals(Object) but a == c in this case
a.equals(d);          // false
a.equals((Point3D)d); // true
c.equals(b);          // false
c.equals(d);          // false

Thus, make sure that you override public boolean equals(Object obj) and not public boolean equals(MyType type).

4. Overriding equals() in a Subclass

Consider a class Duration and a class NanoDuration that extends it:
public class Duration {

  protected int min;
  protected int sec;

  public Duration(int min, int sec) {
    this.min = min; this.sec = sec;
  }
}

public class NanoDuration extends Duration {

  private int nano;

  public NanoDuration(int min, int sec, int nano) {
    super(min, sec);
    this.nano = nano;
  }
  
}
Now let's explore some implementations of equals():

Try #1

// Duration
public boolean equals(Object obj) {
  if (!(obj instanceof Duration) return false;
  Duration d = (Duration)obj;
  return this.min == d.min && this.sec == d.sec;
}

// NanoDuration
public boolean equals(Object obj) {
  if (!(obj instanceof NanoDuration) return false;
  NanoDuration nd = (NanoDuration)obj;
  return this.min == nd.min && this.sec == d.sec && this.nano == nd.nano;
}
At first, this seems reasonable; however, there is a problem because this violates the symmetry requirement of the equals() contract:
Duration d = new Duration(2, 71);
NanoDuration nd = new NanoDuration(2, 71, 82);

d.equals(nd); // true
nd.equals(d); // false

Try #2

// Duration
public boolean equals(Object obj) {
  if (obj == null || !obj.getClass().equals(Duration.class)) {
    return false;
  }
  Duration d = (Duration)o;
  return this.min == d.min && this.sec == d.sec;
}

// NanoDuration
//suppose it inherits equals() from Duration
This seems like it might be an improvement; however, it is worse because it still violates symmetry, and what's worse is that it violates reflexivity in the subclass:
NanoDuration nd = new NanoDuration(0, 6, 170);
nd.equals(nd); // false: nd.getClass().equals(NanoDuration.class)

Try #3

// Duration
public boolean equals(Object obj) {
  if (obj == null || !obj.getClass().equals(getClass()) {
    return false;
  }
  Duration d = (Duration)o;
  return this.min == d.min && this.sec == d.sec;
}

// NanoDuration
public boolean equals(Object obj) {
  if (obj == null || !obj.getClass().equals(getClass()) {
    return false;
  }
  NanoDuration nd = (NanoDuration)o;
  return this.min == nd.min && this.sec == nd.sec && this.nano == nd.nano; 
}
This is a better solution because it satisfies the equals() contract. Unfortunately, it is impossible to compare a Duration with a NanoDuration under this implementation; however, it is difficult to define how such a comparision should be made, anyway. The larger problem with this solution is that "harmless" subclasses cannot be compared. For example, consider the following:
public class ArithmeticDuration extends Duration {

  public Duration addDuration(Duration d) {
    return new Duration(this.min + d.min, this.sec + d.sec);
  }

}
This leads to the following problem:
Duration d = new Duration(4, 20);
Duration a = new ArithmeticDuration(4, 20);
d.getMinutes() == a.getMinutes(); // true
d.getSeconds() == a.getSeconds(); // true
d.equals(a);                      // false
Thus, two objects that appear to be the same are not considered equal when compared with the equals() method. (Note that this problem would also exist if we had used NanoDuration in place of ArithmeticDuration in this example). Even though ArithmeticDuration is not fundamentally different than its superclass, the implementation of equals() in Duration makes the two types incomparable. Thus, this is not an ideal solution to our problem.

Try #4

As Bloch advises in Item 14 of Effective Java: "Favor composition over inheritance." Instead of extending Duration, NanoDuration could reuse Duration by composing it as a field, and delegating calls to it internally, as necessary:
public class NanoDuration {

  private Duration d;
  private int nano;

  public NanoDuration(int min, int sec, int nano) {
    d = new Duration(min, sec);
    this.nano = nano;
  }

}
Note that the client of NanoDuration need not know that NanoDuration uses Duration.

Because Duration and NanoDuration do not extend one another, there are no problems involving inheriting a broken equals() method. However, we do lose something: previously, we could add both Duration and NanoDuration objects to a Duration[] array. Because NanoDuration is no longer a subclass, we can no longer do this. The solution is to create an interface that both classes implement:

public interface TimePeriod {

  public int getMinutes();

  public int getSeconds();

}
Thus, we could create a TimePeriod[] array to which we could add both Duration and NanoDuration objects. Note that great care must be exercised when using the equals() method of objects that are referred to through their interface, as an interface rarely specifies how equals() must be implemented. For example, if we have the following:
TimePeriod tp1, tp2; // suppose these were initialized outside this block of code
java.util.Set set = new java.util.HashSet();
tp1.getMinutes() == tp2.getMinutes(); // true
tp1.getSeconds() == tp2.getSeconds(); // true
set.add(tp1);
set.add(tp2);
System.out.println(set.size());
What gets printed out? From this code, there is no way to know because it is unclear what happens when add(tp2) is invoked and set tests if tp1.equals(tp2). Thus, be conscious of the use of equals() in scenarios like this.

5. equals() and hashCode(): Best Friends Forever

Both the equals(Object obj) and the hashCode() method are defined by Object, so every Java object inherits these two methods. According to the specification for hashCode():

The general contract of hashCode is:

  • Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
  • If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
  • It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Therefore, the rule of thumb is:

If you override equals(), then you should override hashCode().

Why is this so important? Because failure to meet this specification can introduce insidious bugs. For example, when you do a put into a Hashtable, the Hashtable takes the hashCode() of the key and puts the value into a "bucket" with all keys in the same range. Thus, when you use get(Object key) to extract a value from a Hashtable, the table takes the hashCode of the key and then looks through the bucket to which the key maps for another key with (i) an identical hashCode and (ii) whose equals method returns true when applied to the provided key. Thus, if the hashCode for the key changes over the course of execution of the program, a call to get will not be able to find the desired Object because the Hashtable will look in the wrong bucket. Similarly, if the hashCode method is not updated so that it returns the same value for two objects that are the same according to the equals method, then a call to get() will fail to return the desired value because it will not recognize the key as it fails condition (i) stated above. For a concrete example with fewer run-on sentences, see Item 8 in Effective Java.

Thus, it is important to honor the following contract:

If a.equals(b), then a.hashCode() == b.hashCode().

Students often ask if the converse must also hold: if the hashcodes are equal, must the objects be equal, too? The answer is no, and the answer follows from the pigeonhole principle. For example, if you create 232+1 new objects in Java, then at least two of them must have the same hash code. Thus, equal hash codes do not have to guarantee equal objects; however, attempting to adhere to this rule will often improve the performance of your hash tables.

6. Spelling Counts!

When you override a method in Java, make sure that you spell it correctly. Unfortunately, the Java compiler will not signal an error if you declare either of the following methods:
public boolean Equals(Object obj);

public int hashcode();
Instead, it will let you implement them, but will never invoke them because the spelling of the name does not exactly match that of the method it is intended to override. In the unfortunate event that you make this error, you will find that it is an extremely difficult bug to track down. Thus, one way to avoid this problem is to use the Override/Implement Methods... command in Eclipse to create these method signatures for you so that you can be sure that there will not be any typos.

7. Creating a Good Hash Code

Consider the following implementations for Duration.hashCode():

Try #1

return 1;
This is a terrible hash code because it would make everything hash to the same bucket, making the hash code function as a linked list (eliminating the benefit of constant-time-lookup that a hash table should provide).

Try #2

return min;
This is a better hash code; however, it will still result in a lot of collisions because every Duration with the same value for min will end up in the same bucket.

Try #3

return min + sec;
Again, this is better, but there will still be collisions for a number of different durations. For example, when (min,sec) = (3,4) and (min,sec) = (4,3), both will hash to the same bucket.

Try #4

return sec << 16 + min;
This is a bigger improvement because now only equal durations have equal hash codes (assuming minutes and seconds take on values between 0 and 60). However, if only the low order bits are used in the hash (which happens in techniques such as extensible hashing) then there will still be collisions when durations have the same minute. Note that this shows that just because you have unique hash codes for unique values, this does not guarantee a good hash.

Try #5

int result = 17;
result = 37 * result + sec;
result = 37 * result + min;
return result;
"Whoah -- where did this come from?" you ask. Well, the truth is that number theory people have discovered that multiplying by prime numbers helps spread out hash code values. In fact, they have discovered a lot of neat things about how to produce better hash codes, but they are often quite complicated, so it's probably easiest to look at the guidelines that Bloch has in Item 8 and apply them to your code rather than devise your own algorithms to acheive good hashing (unless you're into that sort of thing).

Try #6

Actually, there was nothing wrong with Try #5, but if the hashCode() method is going to be invoked often, and is always going to return the same number, then it will help to cache the hash code and to declare the hashCode() method final so that the cost of method lookup and computation will be reduced. For a large number of elements in a hashed data structure (like nodes in the graph of the Stata Center), this will have a noticeable impact on performance:
/**
 * <code>Duration</code> is immutable.
 */
public class Duration {

  protected final int min;
  protected final int sec;

  private final int hash;

  public Duration(int min, int sec) {
    this.min = min;
    this.sec = sec;

    int h = 37;
    h = 37 * h + sec;
    h = 37 * h + min;
    hash = h;
  }

  public boolean equals(Object obj) {
    if (this == obj) return true;
    if (!(obj instanceof Duration)) return false;
    Duration d = (Duration)obj;
    return min == d.min && sec == d.sec;
  }

  public final int hashCode() {
    return hash;
  }

}
Even if Duration were a mutable object, we could still cache the hash code by updating the cached value every time a Duration were mutated. Alternatively, we could set a flag whenever it is mutated, and then check that flag before returning the cached value: if it has been mutated, then we consider the cached value stale and then recalculate and cache the hash before returning it.

The number of optimizations like this that you want to encode depends on the nature of the problem – if hashCode() is inexpensive to compute, and is not accessed often, then these mechanisms will just make your code more difficult to maintain without providing any real benefit to the client. The important thing is to think carefully about how likely hashCode() is to cause a collision, and to make sure that it is in agreement with equals().

8. Equivalence

In 6.170, we talk about two types of equivalence:
  • behavioral equivalence is when two objects cannot be distinguished by any sequence of operations, including mutators and observers
  • observational equivalence is when two objects cannot be distinguished by any sequence of observer operations

In Java, == tests references for behavioral equivalence because if two references refer to the same object, then any operation performed using one reference must have the same effect as performing the operation on the other reference because they're being applied to the same object!

This may lead you to believe that only references for which == returns true are behaviorially equivalent; however, this is not true. Consider the following:

String a = new String("foo");
String b = new String("foo");
System.out.println(a == b); // prints false
In the example above, a and b are not referentially equivalent; however, they are behaviorally equivalent. The general reason why this is true is because String is an immutable type. An immutable type doesn't have any mutators, so if two immutable objects are observationally equivalent, then they must also be behaviorally equivalent.

Let's also look at an example of how to show that two things are not behaviorally equivalent:

StringBuffer b1 = new StringBuffer(), b2 = new StringBuffer();
System.out.println(b1.length() == b2.length()); // prints true
b1.append("foo");
System.out.println(b1.length() == b2.length()); // prints false
Aha! We found a sequence of mutators (a sequence of one element: b1.append("foo")) that can distinguish between b1 and b2. Thus, b1 and b2 are not behaviorally equivalent.


©2004 Michael Bolinwww.bolinfest.com