Sunday, July 27, 2014

Unit Testing With Hamcrest Matchers: verifying the contents of a collection


Unit Testing With Hamcrest Matchers: verifying the contents of a collection

Some Different Approaches To Verifying A Collection Result
Here's one of my favorite techniques that seems to be currently under utilized in our team. It uses Hamcrest Matchers to verify the contents of a collection in a unit test.
APPROACH 1
1
2
3
4
import static org.hamcrest.Matchers.containsInAnyOrder;
...
   List<String> retrievedNames = ...
   assertThat(retrievedNames, containsInAnyOrder("name1", "name2", "name3"));
Without using Matchers, a typical approach could be something like this:
APPROACH 2
1
2
3
4
List<String> expectedNames = Arrays.asList("name1", "name2", "name3");
List<String> retrievedNames = ...
assertEquals("expectedNames and retrievedNames must match in size.", expectedNames.size(), retrievedNames.size());
assertTrue("expectedNames and retrievedNames must match in contents.", expectedNames.containsAll(retrievedNames));
Hamcrest Matchers: The Clarity, Expressiveness and Flexibility of It
If you ask me, using Hamcrest Matchers as in APPROACH 1 is much clearer. Not only that, it is more flexible. Among the pre-built Matchers that come with Hamcrest are the following:
·       contains(E...): all elements in both sets must match exactly AND with same order
·       containsInAnyOrder(E...): all elements in both sets must match exactly, but iteration order may differ
·       hasItems(E... elems): the other set being matched against is a superset of elems.
·       isIn(E... elems): the other set being matched against is a subset of elems.
You can even combine matchers in a very readable manner, like this:
assertThat(mutantSequence, either(hasItem("AAAGCGAAA")).or(hasItem("TTTCGCTTT")));
Hamcrest Matchers: The Precision and Detail of Its Error Messages
A second major advantage with APPROACH 1 is the msg that is generated when the assertion fails. The message will tell you exactly which element(s) failed to match. E.g.
A Hamcrest Matcher Error Message
Given
 // place.uniqueAiTaxonomyIds() actually returns 20,21,22,23
 assertThat(place.uniqueAiTaxonomyIds(), containsInAnyOrder(20L,21L,23L));
Assertion failure yields
 java.lang.AssertionError:
 Expected: iterable over [<20L>, <21L>, <23L>] in any order
      but: Not matched: <22L>
     at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
     at org.junit.Assert.assertThat(Assert.java:865)
     at org.junit.Assert.assertThat(Assert.java:832)
     at com.millennialmedia.ai.core.model.places.PlaceTest.testUniqueAiTaxonomyIds(PlaceTest.java:25)
How is that for a life-saving diagnostic msg, esp if you have more than just a handful of elements to compare!!

The equals() Conundrum
The worst approach of all (but sadly one that I see all too often) is the following:
APPROACH 3
1
2
3
4
5
List<String> expectedResults = ...
List<String> retrievedResults = ...
assertNotNull(retrievedResults);
assertTrue(retrievedResults.size() > 0);
Notice how we limited the test to verifying that the result is not empty. We don't bother (don't dare?) to verify if the contents are as expected. The test is essentially incomplete. But for some reason this seems to be a very common (if not the most common) approach that you'll encounter in our test code base.
Another gripe (albeit a much smaller one) that I have with the previous example is in the way it invokes assertNotNull and assertTrue without providing a message. If the assertion fails, we get a stack trace like the below.
Failed Assertion With No Msg
java.lang.AssertionError
   at org.junit.Assert.fail(Assert.java:86)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.junit.Assert.assertTrue(Assert.java:52)
   ...
The stack trace would look less mysterious and be a lot more helpful if a well thought out failure msg were added to the assertion. This is especially apt given our team's propensity to write tests that assert multiple things, sometimes up to 10+ asserts, in a single test method (which is an issue in and of itself, but topic for another blog).
Going back to my main complaint about APPROACH 3, why are we so timid about verifying the contents of the retrievedResults? A possible reason is that we could not (or do not want to) depend on the result objects' equals() method. Maybe the equals() method has not been overridden (and it is, arguably, dubious to implement the equals() method just to satisfy some testing need). Or maybe the equals() method as implemented does not lend itself to the needs of the test, e.g. it does not compare all the object's properties that we need for the test.
If we could depend on the equals() method, we could have easily used APPROACH 1 or 2. The simplest approach is usually best. But what do we do if it so happens that we could not depend on the equals() method?
I've seen places in our test code (at least 4 or 5 places, but perhaps more) where we address the issue using the following approach:
APPROACH 4
avs = intel.getAudienceValues();
AssertJUnit.assertNotNull(avs);
AssertJUnit.assertEquals(4, avs.size());
for (AudienceValue av : avs)
{
   if (av.getAudienceValueId() == parentExplicitAudienceValueId)
   {
       AssertJUnit.assertEquals(1.0, av.getReferenceValue(), 0);
       AssertJUnit.assertEquals(1, av.getFactors().size());
       AssertJUnit.assertEquals("value1", av.getFactors().get(0).getValues());
       AssertJUnit.assertEquals(10.0, av.getFactors().get(0).getWeight());
   }
   else if (av.getAudienceValueId() == genderAudienceValueId)
   {
       AssertJUnit.assertEquals(1.0, av.getReferenceValue(), 0);
       AssertJUnit.assertEquals(1, av.getFactors().size());
       AssertJUnit.assertEquals("value2", av.getFactors().get(0).getValues());
       AssertJUnit.assertEquals(20.0, av.getFactors().get(0).getWeight());
   }
   else if (av.getAudienceValueId() == parentAudienceValueId)
   {
       AssertJUnit.assertEquals(1.0, av.getReferenceValue(), 0);
       AssertJUnit.assertEquals(1, av.getFactors().size());
       AssertJUnit.assertEquals("PARENT_EXP=PROUD_PARENT", av.getFactors().get(0).getValues());
       AssertJUnit.assertEquals(1.0, av.getFactors().get(0).getWeight());
   }
   else if (av.getAudienceValueId() == femaleAudienceValueId)
   {
       AssertJUnit.assertEquals(1.0, av.getReferenceValue(), 0);
       AssertJUnit.assertEquals(1, av.getFactors().size());
       AssertJUnit.assertEquals("GENDER=FEMALE", av.getFactors().get(0).getValues());
       AssertJUnit.assertEquals(1.0, av.getFactors().get(0).getWeight());
   }
   else
   {
       AssertJUnit.fail("unknown audience for deviceId1 with avId = [" + av.getAudienceValueId() + "]");
   }
So this approach seems to be comparing only selected properties within the AudienceValue objects. However, this approach suffers from a couple of issues. First, is readability, maintainability, and the repetition of similar comparison logic across each type of AV object (which also makes it more error prone).
The second issue is that this code may actually contain a (not so subtle) bug. Examining the code, it seems one assumption is that the collection "avs" contains exactly 4 elements (hence the assertEquals(4, avs.size()). That's ok. Another assumption seems to be that the elements in the "avs" collection may be in any particular order, hence the use of the for loop. So far so good. Lastly, it seems that the intent is to test that the "avs" collection contains exactly one instance of each of the 4 types of AV objects. But if avs contained 4 duplicate instances of the same type of AV object, the test would still pass. So the test is not precise enough.
Custom Matchers to the Rescue
What's a better approach? Roll your own Matcher! I was going to provide my own example on how to do this, but seeing that these articles already did such an excellent job, I think I'll just refer you to them:
·       Hamcrest Tutorial
·       How Hamcrest Can Save Your Soul : excellent article explaining what a custom Hamcrest Matcher can do for you, with a full example.
As illustrated by the last of the above articles, by writing a custom Matcher, you gain a couple of valuable things that the equals() method can't buy you. First, when comparing objects defined in 3rd party libraries, you won't be able to override equals() to meet your testing needs. Second, a custom Matcher can provide a very detailed diff msg of exactly which properties do not match. For example, if you depend on the equals() method you get something like this:
Expected: <BIG GLOB OF OBJECT AND FIELD ATTRIBUTES GENERATED WITH ToStringBuilder>
but was: <OTHER BIG GLOB OF OBJECT AND FIELD ATTRIBUTES WITH GENERATED WITH ToStringBuilder>
OTOH, by writing your own custom Matcher, you can control the error msg to pinpoint exactly which properties in the two objects differ:
Given
   assertThat(lukesFirstLightsaber, is(equalTo(maceWindusLightsaber)));
We get:
   Expected: is {singleBladed is true, color is PURPLE, hilt is {size is LARGE, ...}}
   but: is {color is GREEN, hilt is {size is MEDIUM}}
Way cool, and potentially a great time-saver, esp if the custom Matcher you write is used in several tests and if the object being validated contains lots of fields or is a complex hierarchy of other objects. (Caveat: Keep in mind though that the simplest approach would be to leverage the equals() method if you can, and I suspect that in most cases you should be able to. There's no need to go around writing custom Matchers if the equals() method already serves your need).
In Closing
Finally, I'll end with a few recommendations and tips.
1.     I'd recommend using hamcrest-all-1.3, which is the latest version as of this writing (Feb 2013)
2.     Use a version of JUnit that matches up to the version of Hamcrest that you are using. JUnit 4.4+ comes with hamcrest-core bundled (classes in hamcrest-core are a subset of those in hamcrest-all). Using incompatible versions of JUnit and hamcrest-all can lead to funky errors, depending on which jar happens to come first on your classpath. If using hamcrest-all-1.3, I suggest using a version of JUnit that corresponds to hamcrest 1.3, such as JUnit 4.10 or 4.11. The other alternative is to use junit-dep-xxx.jar which does not include hamcrest.
3.     Hamcrest 1.3 API javadoc can be found here. Surprisingly, it's not that easy to search for it online. (www.jarvana.com also provides the javadocs, but instead of a consolidated set of javadocs, it's split between hamcrest-core and hamcrest-library).
4.     Hamcrest is a general purpose matcher library. It's origins lie with the JMock mocking framework, and is now used heavily too with other mocking frameworks such as Mockito/PowerMock and JMockit. Other interesting uses for Hamcrest include implementing easy to grok Regex matching, and implementing functional idioms for Java (see for example the Processing Collections section of this Uses Of Hamcrest page).
5.     Another interesting library is Fest. It's more of a fluent assertions library, not really a general purpose matcher library. Here's how Fest compares with Hamcrest.
6.     This slide presentation, JUnit Kung Fu: Getting More Out of Your Unit Tests, provides great tips on best practices for using JUnit with Hamcrest, and for unit testing in general. It also provides wonderful graphics that highlight the key points to remember when rolling your own custom Matcher, and gives overviews of advanced Junit features such as Rules, Categories, Parameterized Tests, and Theories.
Hope you enjoy using Hamcrest with JUnit (or perhaps Fest) to improve our unit tests!

Should We Avoid Returning Null For Collections?


Andrew Sy (162 karma) posted on Jun 01, 2012
I'm going to cover here a very old topic. The subject is on how to deal with a method that returns a collection when there is nothing to return. Although it is a very old topic, it is still very pertinent to the current state of our code base.
Recently I ran into a NullPointerException in code that looks like this:
for (Outer outer : myThing.getOuters())
{
   for (Middle mid : outer.getMids())
   {
       for (Inner inner : mid.getInners())
       {
In this case, the NPE happened because mid.getInners() returned null. This is the third such case I've seen in the last 3 months. So it's not an isolated case. The bug stems from a difference in expectations on the part of the caller and the callee. The caller expects getOuters, getMids and getInners to never return null, but instead to return an empty collection whenever there is nothing to return. OTOH, the callee expects the caller to always check for null.
If we expect the caller to check for null, our code might look like this.
if (myThing.getOuters() != null)
{
   for (Outer outer : myThing.getOuters())
   {
       if (outer.getMids() != null)
       {
           for (Middle mid : outer.getMids())
           {
               if (mid.getInners() != null)
               {
                   for (Inner inner : mid.getInners())
                   {
So 3 levels of nesting has gone to 6. (Note that I'm not checking myThing, outer, mid and inner for null. That's because I want to focus for now only on methods that return collections, namely getOuters, getMids, and getInners. Towards the end of this article, we'll touch again on the issue of null elements inside a collection.).
If we want to avoid deep nesting, we can alternatively do something like this.
public class AntiNullHelper
{
   public static <T> List<T> toEmptyIfNull(List<T> input)
   {
      return input != null ? input : Collections.<T>emptyList();
   }
}
And then use it like this.
       //Instead of AntiNullHelper.toEmptyIfNull, similar effect can be achieved with Guava:
       //Objects.firstNonNull(myThing.getOuters(), ImmutableList.<Outer>of())
       //Pick your poison.
       for (Outer outer : AntiNullHelper.toEmptyIfNull(myThing.getOuters()))
       {
           for (Middle mid : AntiNullHelper.toEmptyIfNull(outer.getMids()))
           {
               for (Inner inner : AntiNullHelper.toEmptyIfNull(mid.getInners()))
               {
Yikes! Neither of these two approaches are very appealing. What's a better approach? Well, Joshua Bloch in his classic and highly influential book, Effective Java, has a section entitled "Return empty arrays or collections, not nulls". In it he makes the recommendation to return an empty collection in place of null. One advantage of doing this is that it centralizes the null check in the callee object. Otherwise, the null checks would have to be scattered throughout each piece of client code. Aside from reducing the "oops" potential, the freed up client code becomes much more readable (hence easier to maintain, less likely to cause serious brain injury to the reader, etc.)
Before we show an example of this, we interject here a "seemingly" opposing view from an equally respectable source: Eric Lippert, principal developer on the C# team. On his blog entitled Null Is Not Empty, Eric explains why "C# does not simply treat null references passed to foreach as empty collections". His reason is that null can potentially be used to indicate missing or incomplete information, as opposed to no information. For designers of a general purpose language like C#, I can understand why it would be unwise to make a blanket assumption about the meaning of null vs empty. But for developers at the application level, we usually have enough insight into our domain to make a sound judgment on this. Take for example, PulseBundle#getLocationPulses(). Is there ever a need for us to make a distinction between "no location pulses available" (clearly an empty collection), vs "I cannot determine whether or not there are location pulses"? If anything, shouldn't the latter case be regarded as an exception condition (which is better treated by raising an exception)?
So now that I've (hopefully) convinced you that returning an empty collection is superior to returning null, how do we implement this? Rather straightforwardly
public class Outer
{
   private List<Middle> mids = new ArrayList<Middle>();
    
   public Outer()
   {
       //Centralize the initialization of mids as in above.
       //Avoid scattering initialization of mids in the ctors, since you may eventually have multiple ctors.
       //Except for ctors that take in init val for mids as arg, like the below.
   }
    
   public Outer(List<Middle> mids)
   {
       this.setMids(mids);
   }
    
   /**
    * @param mids
    * @throws IllegalArgumentException if mids arg is null.
    */
   public void setMids(List<Middle> mids) IllegalArgumentException
   {
       if (mids != null) { throw new IllegalArgumentException("mids arg may not be null."); }
       this.mids = mids;
   }
    
   /**
    * @return never null
    */
   public List<Middle> getMids()
   {
       return mids;
   }
    
}
A variation of the above is to initialize mids as follows.
private List<Middle> mids = Collections.<Middle>emptyList();
or
@SuppressWarnings("unchecked")
private List<Middle> mids = Collections.EMPTY_LIST;
This optimization avoids creating a new list each time a new instance of Outer is created, and is only worthwhile if your code is creating a ton of Outer objects relative to the time it spends doing other stuff (such as doing I/O), AND if most of the "mids" collections being created are not being used (remain empty). Not worth doing in most cases, I'm sure you'd agree. Also note that Collections.EMPTY_LIST is immutable. A client that happens to obtain a reference to it (by invoking getMids) and tries to mutate it will, of course, end up with an UnsupportedOperationException.
Which brings us to our next topic. Now that we've guaranteed to never return null for a collection reference, should we then allow the client to modify the collection that is returned. This is a much more difficult question. If the intention is to allow read only access, then you can do the following in the get method.
/**
* @return unmodifiable view of mids
*/
public List<Middle> getMids()
{
   return Collections.unmodifiableList(mids);
}
If the client code accidentally tries to modify the returned list, it will get an UnsupportedOperationException.
But if the intent is to allow the client to modify the list "owned" by the Outer instance, then you can either
  • provide a getMids() that returns the raw list for clients to modify directly (in which case it may makes sense to pair it up with a setMids(List<Middle>) that allows clients to set the "mids" property directly)
  • or you can provide mutators in the Outer class, such as add, addAll, merge, remove, and clear, that the client can use to modify the list indirectly (in which case it usually makes sense to not expose a setMids method to the clients).
The former strategy is simpler. The latter is more onerous for the developer of the Outer class; one only has to think of equals() and hashCode() and serialVersionUID to see what I mean by onerous. How often do developers pay attention to them :
That said, the latter strategy does come with some important advantages. For example, the add method can prevent null elements from being added to the list (a side issue we encountered at the start of this article). The add method can also be used to ensure bi-directional links between the elements of a collection and the owner of a collection, if this is required. Take for example a team game application where a Team object has a bi-directional one-to-many relationship with Player objects. A Team#add(Player) method can ensure this relationship as follows:
public class Team
{
   List<Player> players = new ArrayList<Player>();
    
   public void add(Player player)
   {
       if (player == null) { //throw exception }
       players.add(player);
       player.setTeam(this);
   }
}
Which strategy to choose? I guess it varies case-by-case. Unless I have a special need (such as bi-directional relationships), my default strategy is to go with the simpler, less onerous strategy of directly exposing the setter to the client, and providing a getter that returns a read-only view of the collection. Unlike the problematic case of returning null for a collection, collections containing null elements seem, for some reason, to be less of a problem in my experience.
Furthermore, I sometimes find it convenient to create and populate a collection externally, before passing it directly into the eventual owner via a setter method. Take the parsing of Factual data for example. It is convenient to create and populate a collection of Factual layers, places and addresses before calling the set method of the eventual owner. This decouples the processing that goes towards the creation of the collection from the owner object (which in most cases is just a POJO bean with plain getters and setters and minimal business logic). But once the collection is passed to the owner, in many (but not all) cases, the collection is frozen at that point. So it makes sense for the getter to return an unmodifiable view of the collection.
To summarize:
  • Avoid returning null when you can return an empty collection. Protect the setter of objects that own collections from being passed null references.
  • You can protect a collection by having the owner expose mutators (such as add, addAll, merge, remove, clear, replace, etc) to the client. In which case, the owner should not expose a public setter for the collection and its getter should return an immutable view of the collection.
  • If on the other hand you allow direct access to the collection by exposing a getter that returns the raw collection (as opposed to a view), then you may decide to expose a setter too. But make sure the setter does not allow null collection ref to be passed in.  
  • Document your decision in the javadoc of your getters and setters, so clients know what to expect. E.g. setter rejects null collection arg, or getter returns immutable view.
Well, I've been long-winded enough. Time to say "your feedback is highly welcome."
Additional Resources:
Misc topics on the concept of null, including other ways of dealing with null in Java


p.s. Here's a quote from C.A.R. Hoare on null references in general:
Speaking at a conference in 2009, Hoare apologized for inventing the null reference:[9][10]
I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.