I’m happy to see mostly correct answers to the School coding challenge.
To make things more interesting, let’s start by looking at a naïve (and wrong) solution:
@Override public int hashCode() { final int prime = 31; int result = 1; result = prime * result + ((name == null) ? 0 : name.hashCode()); result = prime * result + ((nickname == null) ? 0 : nickname.hashCode()); return result; } @Override public boolean equals(Object obj) { // ... details omitted return other.nickname.equals(nickname) || other.name.equals(name); }
Let’s create two School objects, make sure they are equal, add them to a set and then display this set:
School school1 = new School("University of Pennsylvania", "upenn"); School school2 = new School("University of Pennsylvania", "Penn State"); System.out.println("Equals: " + school1.equals(school2)); Set<School> schools = Sets.newHashSet(); schools.add(school1); schools.add(school2); System.out.println("Set: " + schools);
The output:
Equals: true Set: [com.beust.School@d8e612c6, com.beust.School@7086780a]
This is obviously wrong and caused by the fact that this implementation breaks several laws, among which: “if two objects are equal, their hashcode should be equal as well”. However, this is clearly not the case here since the two objects instantiated in the main method are equal but their hash code are different, since it’s calculated from both the name and nickname. The consequences of such an implementation are disastrous in more ways than one, starting with the fact that your objects cannot reliably be stored in nor retrieved from identity-based collections (e.g. sets and maps).
The only way out of this would be to return a variable hashcode, but this doesn’t make sense since we wouldn’t know what field to base that hash code on.
Most commenters came up with the idea of always returning the same value for hashCode(), regardless of the name and nickname values. This approach is actually correct but it comes at the cost of losing all running time benefits offered by sets and maps, such as constant time retrievals. Instead, Hashmap#get and Set#contains are now linear operations. That’s a pretty bad prank to pull on your coworkers when their collections can contain millions of objects.
So, what’s the best way to solve this problem?
I’m not quite sure. There are several approaches, all with their pros and cons.
The first idea would be to not override equals and hashCode, using their default implementation defined on Object. This implementation (simple reference equality test) is guaranteed to be wrong in most cases, but since you can’t write a correct one, maybe the best approach is to do nothing.
Another approach would be to throw an OperationNotSupportedException in equals/hashCode. The implementation is still wrong but at least, your program will crash loudly if someone attempts to put these objects in an identity-based collection.
Ultimately, you really want to get rid of the fuzzy matching performed in the equals method, so the best approach is probably to perform a first pass on all your School objects and unify all those that are equal under the same id. After this, you can implement equals/hashCode based on this id field.