Archive for February, 2013

Answer to the “School” challenge

I’m happy to see mostly correct answers to the School coding challenge.

To make things more interesting, let’s start by looking at a naïve (and wrong) solution:

  @Override
  public int hashCode() {
    final int prime = 31;
    int result = 1;
    result = prime * result + ((name == null) ? 0 : name.hashCode());
    result = prime * result + ((nickname == null) ? 0 : nickname.hashCode());
    return result;
  }
  @Override
  public boolean equals(Object obj) {
    // ... details omitted
    return other.nickname.equals(nickname) || other.name.equals(name);
  }

Let’s create two School objects, make sure they are equal, add them to a set and then display this set:

    School school1 = new School("University of Pennsylvania", "upenn");
    School school2 = new School("University of Pennsylvania", "Penn State");
    System.out.println("Equals: " + school1.equals(school2));
    Set<School> schools = Sets.newHashSet();
    schools.add(school1);
    schools.add(school2);
    System.out.println("Set: " + schools);

The output:

Equals: true
Set: [com.beust.School@d8e612c6, com.beust.School@7086780a]

This is obviously wrong and caused by the fact that this implementation breaks several laws, among which: “if two objects are equal, their hashcode should be equal as well”. However, this is clearly not the case here since the two objects instantiated in the main method are equal but their hash code are different, since it’s calculated from both the name and nickname. The consequences of such an implementation are disastrous in more ways than one, starting with the fact that your objects cannot reliably be stored in nor retrieved from identity-based collections (e.g. sets and maps).

The only way out of this would be to return a variable hashcode, but this doesn’t make sense since we wouldn’t know what field to base that hash code on.

Most commenters came up with the idea of always returning the same value for hashCode(), regardless of the name and nickname values. This approach is actually correct but it comes at the cost of losing all running time benefits offered by sets and maps, such as constant time retrievals. Instead, Hashmap#get and Set#contains are now linear operations. That’s a pretty bad prank to pull on your coworkers when their collections can contain millions of objects.

So, what’s the best way to solve this problem?

I’m not quite sure. There are several approaches, all with their pros and cons.

The first idea would be to not override equals and hashCode, using their default implementation defined on Object. This implementation (simple reference equality test) is guaranteed to be wrong in most cases, but since you can’t write a correct one, maybe the best approach is to do nothing.

Another approach would be to throw an OperationNotSupportedException in equals/hashCode. The implementation is still wrong but at least, your program will crash loudly if someone attempts to put these objects in an identity-based collection.

Ultimately, you really want to get rid of the fuzzy matching performed in the equals method, so the best approach is probably to perform a first pass on all your School objects and unify all those that are equal under the same id. After this, you can implement equals/hashCode based on this id field.

Coding challenge (“light” edition)

This coding challenge is a bit different from the previous one ([1] [2]) because it is shorter and also a bit more closely tied to Java.

A School has either a name (“Stanford”) or a nickname (“UCSD”) or both.

Your task is to write equals() and hashCode() methods for the School class that satisfy the following requirement: two Schools are identical if they either have the same name or the same nickname.