Thursday, August 30, 2012

The Cost of Apache Commons HashCode

I had a discussion at work today about the cost of object creation in Java. It was about the way you are supposed to use Apache Commons HashCodeBuilder and EqualsBuilder creating a new object on every single call to hashcode() or equals():

@Override
public int hashCode(){
    return new HashCodeBuilder()
        .append(name)
        .append(length)
        .append(children)
        .toHashCode();
}

While I was pretty sure that the cost of creating these short lived objects on a modern JVM was pretty negligible in most use cases, I had no numbers to prove it.

Time for a biased benchmark of the simplest kind like the following:

package org.blogspot.pbrc.benchmarks;

import java.util.HashSet;
import java.util.Random;
import java.util.Set;

public class Runner {

 private static final int NUM_ELEMS = 100000;
 private static final int NUM_RUNS = 100;

 public static void main(String[] args) {
  String[] keys = createKeys();
  Integer[] numerics = createNumericValues();

  // priming the jvm
  for (int i = 0; i < 5; i++) {
   manualEqualsTest(i, keys, numerics);
   autoEqualsTest(i, keys, numerics);
  }

  long time = 0l;
  for (int i = 0; i < NUM_RUNS; i++) {
   time += manualEqualsTest(i, keys, numerics);
  }
  System.out.format("%s %.4fsecs\n", "Manual avg:",
    (time / Double.valueOf(NUM_RUNS)) / 1000000000.0);

  long autotime = 0l;
  for (int i = 0; i < NUM_RUNS; i++) {
   autotime += autoEqualsTest(i, keys, numerics);
  }
  System.out.format("%s %.4fsecs\n", "Apache commons avg:",
    (autotime / Double.valueOf(NUM_RUNS)) / 1000000000.0);

 }

 private static long autoEqualsTest(int runNumber, String[] keys,
   Integer[] numerics) {
  Set<CommonsEqualsAndHashcode> autoObjs = new HashSet<CommonsEqualsAndHashcode>(
    NUM_ELEMS);

  long start = System.nanoTime();
  for (int i = 0; i < NUM_ELEMS; i++) {
   autoObjs.add(new CommonsEqualsAndHashcode(keys[i], numerics[i]));
  }

  return System.nanoTime() - start;
 }

 private static long manualEqualsTest(int runNumber, String[] keys,
   Integer[] numerics) {
  Set<ManualEqualsAndHashCode> valueObjs = new HashSet<ManualEqualsAndHashCode>(
    NUM_ELEMS);
  long start = System.nanoTime();

  for (int i = 0; i < NUM_ELEMS; i++) {
   valueObjs.add(new ManualEqualsAndHashCode(keys[i], numerics[i]));
  }

  return System.nanoTime() - start;
 }

 private static Integer[] createNumericValues() {
  Integer[] result = new Integer[NUM_ELEMS];
  Random rand = new Random();
  for (int i = 0; i < NUM_ELEMS; i++) {
   result[i] = rand.nextInt();
  }
  return result;
 }

 private static String[] createKeys() {
  String[] result = new String[NUM_ELEMS];
  RandomString rand = new RandomString(32);
  for (int i = 0; i < NUM_ELEMS; i++) {
   result[i] = rand.nextString();
  }
  return result;
 }

}

I used two—otherwise identical—immutable value types: one with hand-crafted equals() and hashcode(), the other using Apache Commons HashCodeBuilder and EqualsBuilder.

package org.blogspot.pbrc.benchmarks;
public class ManualEqualsAndHashCode {

 public final String val;
 public final Integer numeric;

 public ManualEqualsAndHashCode(String val, Integer numeric) {
  this.val = val;
  this.numeric = numeric;

 }

 @Override
 public boolean equals(Object obj) {
  if (obj == this) {
   return true;
  }
  if (!(obj instanceof ManualEqualsAndHashCode)) {
   return false;
  }
  ManualEqualsAndHashCode other = (ManualEqualsAndHashCode) obj;
  return other.val.equals(this.val) && other.numeric.equals(this.numeric);
 }

 @Override
 public int hashCode() {
  int result = 17;
  result = 31 * result + val.hashCode();
  result = 31 * result + numeric;
  return result;
 }

}

I got the following results on a 2 GHz Intel Core 2 Duo with 4 GB 1067 MHz DDR3 running OS X 10.8.1 (12B19) with Java 1.7.06


Manual avg: 0.4052secs
Apache commons avg: 0.4508secs

In this particular scenario the overhead—when using the Apache Commons classes—amounts to about ten percent. Depending on where you're coming from, this might be a price worth paying. But, of course, this test scenario of just putting items into a collection is highly synthetic so your mileage may vary.

Now go and find more flaws in the benchmark!

No comments: