p.brc: The Cost of Apache Commons HashCode

I had a discussion at work today about the cost of object creation in Java. It was about the way you are supposed to use Apache Commons HashCodeBuilder and EqualsBuilder creating a new object on every single call to hashcode() or equals():

@Override
public int hashCode(){
    return new HashCodeBuilder()
        .append(name)
        .append(length)
        .append(children)
        .toHashCode();
}

While I was pretty sure that the cost of creating these short lived objects on a modern JVM was pretty negligible in most use cases, I had no numbers to prove it.

Time for a biased benchmark of the simplest kind like the following:

package org.blogspot.pbrc.benchmarks;

import java.util.HashSet;
import java.util.Random;
import java.util.Set;

public class Runner {

 private static final int NUM_ELEMS = 100000;
 private static final int NUM_RUNS = 100;

 public static void main(String[] args) {
  String[] keys = createKeys();
  Integer[] numerics = createNumericValues();

  // priming the jvm
  for (int i = 0; i < 5; i++) {
   manualEqualsTest(i, keys, numerics);
   autoEqualsTest(i, keys, numerics);
  }

  long time = 0l;
  for (int i = 0; i < NUM_RUNS; i++) {
   time += manualEqualsTest(i, keys, numerics);
  }
  System.out.format("%s %.4fsecs\n", "Manual avg:",
    (time / Double.valueOf(NUM_RUNS)) / 1000000000.0);

  long autotime = 0l;
  for (int i = 0; i < NUM_RUNS; i++) {
   autotime += autoEqualsTest(i, keys, numerics);
  }
  System.out.format("%s %.4fsecs\n", "Apache commons avg:",
    (autotime / Double.valueOf(NUM_RUNS)) / 1000000000.0);

 }

 private static long autoEqualsTest(int runNumber, String[] keys,
   Integer[] numerics) {
  Set<CommonsEqualsAndHashcode> autoObjs = new HashSet<CommonsEqualsAndHashcode>(
    NUM_ELEMS);

  long start = System.nanoTime();
  for (int i = 0; i < NUM_ELEMS; i++) {
   autoObjs.add(new CommonsEqualsAndHashcode(keys[i], numerics[i]));
  }

  return System.nanoTime() - start;
 }

 private static long manualEqualsTest(int runNumber, String[] keys,
   Integer[] numerics) {
  Set<ManualEqualsAndHashCode> valueObjs = new HashSet<ManualEqualsAndHashCode>(
    NUM_ELEMS);
  long start = System.nanoTime();

  for (int i = 0; i < NUM_ELEMS; i++) {
   valueObjs.add(new ManualEqualsAndHashCode(keys[i], numerics[i]));
  }

  return System.nanoTime() - start;
 }

 private static Integer[] createNumericValues() {
  Integer[] result = new Integer[NUM_ELEMS];
  Random rand = new Random();
  for (int i = 0; i < NUM_ELEMS; i++) {
   result[i] = rand.nextInt();
  }
  return result;
 }

 private static String[] createKeys() {
  String[] result = new String[NUM_ELEMS];
  RandomString rand = new RandomString(32);
  for (int i = 0; i < NUM_ELEMS; i++) {
   result[i] = rand.nextString();
  }
  return result;
 }

}

I used two—otherwise identical—immutable value types: one with hand-crafted equals() and hashcode(), the other using Apache Commons HashCodeBuilder and EqualsBuilder.

package org.blogspot.pbrc.benchmarks;
public class ManualEqualsAndHashCode {

 public final String val;
 public final Integer numeric;

 public ManualEqualsAndHashCode(String val, Integer numeric) {
  this.val = val;
  this.numeric = numeric;

 }

 @Override
 public boolean equals(Object obj) {
  if (obj == this) {
   return true;
  }
  if (!(obj instanceof ManualEqualsAndHashCode)) {
   return false;
  }
  ManualEqualsAndHashCode other = (ManualEqualsAndHashCode) obj;
  return other.val.equals(this.val) && other.numeric.equals(this.numeric);
 }

 @Override
 public int hashCode() {
  int result = 17;
  result = 31 * result + val.hashCode();
  result = 31 * result + numeric;
  return result;
 }

}

I got the following results on a 2 GHz Intel Core 2 Duo with 4 GB 1067 MHz DDR3 running OS X 10.8.1 (12B19) with Java 1.7.06


Manual avg: 0.4052secs
Apache commons avg: 0.4508secs

In this particular scenario the overhead—when using the Apache Commons classes—amounts to about ten percent. Depending on where you're coming from, this might be a price worth paying. But, of course, this test scenario of just putting items into a collection is highly synthetic so your mileage may vary.

Now go and find more flaws in the benchmark!

p.brc

Thursday, August 30, 2012

The Cost of Apache Commons HashCode

No comments: