Sunday, July 27, 2014

Should We Avoid Returning Null For Collections?


Andrew Sy (162 karma) posted on Jun 01, 2012
I'm going to cover here a very old topic. The subject is on how to deal with a method that returns a collection when there is nothing to return. Although it is a very old topic, it is still very pertinent to the current state of our code base.
Recently I ran into a NullPointerException in code that looks like this:
for (Outer outer : myThing.getOuters())
{
   for (Middle mid : outer.getMids())
   {
       for (Inner inner : mid.getInners())
       {
In this case, the NPE happened because mid.getInners() returned null. This is the third such case I've seen in the last 3 months. So it's not an isolated case. The bug stems from a difference in expectations on the part of the caller and the callee. The caller expects getOuters, getMids and getInners to never return null, but instead to return an empty collection whenever there is nothing to return. OTOH, the callee expects the caller to always check for null.
If we expect the caller to check for null, our code might look like this.
if (myThing.getOuters() != null)
{
   for (Outer outer : myThing.getOuters())
   {
       if (outer.getMids() != null)
       {
           for (Middle mid : outer.getMids())
           {
               if (mid.getInners() != null)
               {
                   for (Inner inner : mid.getInners())
                   {
So 3 levels of nesting has gone to 6. (Note that I'm not checking myThing, outer, mid and inner for null. That's because I want to focus for now only on methods that return collections, namely getOuters, getMids, and getInners. Towards the end of this article, we'll touch again on the issue of null elements inside a collection.).
If we want to avoid deep nesting, we can alternatively do something like this.
public class AntiNullHelper
{
   public static <T> List<T> toEmptyIfNull(List<T> input)
   {
      return input != null ? input : Collections.<T>emptyList();
   }
}
And then use it like this.
       //Instead of AntiNullHelper.toEmptyIfNull, similar effect can be achieved with Guava:
       //Objects.firstNonNull(myThing.getOuters(), ImmutableList.<Outer>of())
       //Pick your poison.
       for (Outer outer : AntiNullHelper.toEmptyIfNull(myThing.getOuters()))
       {
           for (Middle mid : AntiNullHelper.toEmptyIfNull(outer.getMids()))
           {
               for (Inner inner : AntiNullHelper.toEmptyIfNull(mid.getInners()))
               {
Yikes! Neither of these two approaches are very appealing. What's a better approach? Well, Joshua Bloch in his classic and highly influential book, Effective Java, has a section entitled "Return empty arrays or collections, not nulls". In it he makes the recommendation to return an empty collection in place of null. One advantage of doing this is that it centralizes the null check in the callee object. Otherwise, the null checks would have to be scattered throughout each piece of client code. Aside from reducing the "oops" potential, the freed up client code becomes much more readable (hence easier to maintain, less likely to cause serious brain injury to the reader, etc.)
Before we show an example of this, we interject here a "seemingly" opposing view from an equally respectable source: Eric Lippert, principal developer on the C# team. On his blog entitled Null Is Not Empty, Eric explains why "C# does not simply treat null references passed to foreach as empty collections". His reason is that null can potentially be used to indicate missing or incomplete information, as opposed to no information. For designers of a general purpose language like C#, I can understand why it would be unwise to make a blanket assumption about the meaning of null vs empty. But for developers at the application level, we usually have enough insight into our domain to make a sound judgment on this. Take for example, PulseBundle#getLocationPulses(). Is there ever a need for us to make a distinction between "no location pulses available" (clearly an empty collection), vs "I cannot determine whether or not there are location pulses"? If anything, shouldn't the latter case be regarded as an exception condition (which is better treated by raising an exception)?
So now that I've (hopefully) convinced you that returning an empty collection is superior to returning null, how do we implement this? Rather straightforwardly
public class Outer
{
   private List<Middle> mids = new ArrayList<Middle>();
    
   public Outer()
   {
       //Centralize the initialization of mids as in above.
       //Avoid scattering initialization of mids in the ctors, since you may eventually have multiple ctors.
       //Except for ctors that take in init val for mids as arg, like the below.
   }
    
   public Outer(List<Middle> mids)
   {
       this.setMids(mids);
   }
    
   /**
    * @param mids
    * @throws IllegalArgumentException if mids arg is null.
    */
   public void setMids(List<Middle> mids) IllegalArgumentException
   {
       if (mids != null) { throw new IllegalArgumentException("mids arg may not be null."); }
       this.mids = mids;
   }
    
   /**
    * @return never null
    */
   public List<Middle> getMids()
   {
       return mids;
   }
    
}
A variation of the above is to initialize mids as follows.
private List<Middle> mids = Collections.<Middle>emptyList();
or
@SuppressWarnings("unchecked")
private List<Middle> mids = Collections.EMPTY_LIST;
This optimization avoids creating a new list each time a new instance of Outer is created, and is only worthwhile if your code is creating a ton of Outer objects relative to the time it spends doing other stuff (such as doing I/O), AND if most of the "mids" collections being created are not being used (remain empty). Not worth doing in most cases, I'm sure you'd agree. Also note that Collections.EMPTY_LIST is immutable. A client that happens to obtain a reference to it (by invoking getMids) and tries to mutate it will, of course, end up with an UnsupportedOperationException.
Which brings us to our next topic. Now that we've guaranteed to never return null for a collection reference, should we then allow the client to modify the collection that is returned. This is a much more difficult question. If the intention is to allow read only access, then you can do the following in the get method.
/**
* @return unmodifiable view of mids
*/
public List<Middle> getMids()
{
   return Collections.unmodifiableList(mids);
}
If the client code accidentally tries to modify the returned list, it will get an UnsupportedOperationException.
But if the intent is to allow the client to modify the list "owned" by the Outer instance, then you can either
  • provide a getMids() that returns the raw list for clients to modify directly (in which case it may makes sense to pair it up with a setMids(List<Middle>) that allows clients to set the "mids" property directly)
  • or you can provide mutators in the Outer class, such as add, addAll, merge, remove, and clear, that the client can use to modify the list indirectly (in which case it usually makes sense to not expose a setMids method to the clients).
The former strategy is simpler. The latter is more onerous for the developer of the Outer class; one only has to think of equals() and hashCode() and serialVersionUID to see what I mean by onerous. How often do developers pay attention to them :
That said, the latter strategy does come with some important advantages. For example, the add method can prevent null elements from being added to the list (a side issue we encountered at the start of this article). The add method can also be used to ensure bi-directional links between the elements of a collection and the owner of a collection, if this is required. Take for example a team game application where a Team object has a bi-directional one-to-many relationship with Player objects. A Team#add(Player) method can ensure this relationship as follows:
public class Team
{
   List<Player> players = new ArrayList<Player>();
    
   public void add(Player player)
   {
       if (player == null) { //throw exception }
       players.add(player);
       player.setTeam(this);
   }
}
Which strategy to choose? I guess it varies case-by-case. Unless I have a special need (such as bi-directional relationships), my default strategy is to go with the simpler, less onerous strategy of directly exposing the setter to the client, and providing a getter that returns a read-only view of the collection. Unlike the problematic case of returning null for a collection, collections containing null elements seem, for some reason, to be less of a problem in my experience.
Furthermore, I sometimes find it convenient to create and populate a collection externally, before passing it directly into the eventual owner via a setter method. Take the parsing of Factual data for example. It is convenient to create and populate a collection of Factual layers, places and addresses before calling the set method of the eventual owner. This decouples the processing that goes towards the creation of the collection from the owner object (which in most cases is just a POJO bean with plain getters and setters and minimal business logic). But once the collection is passed to the owner, in many (but not all) cases, the collection is frozen at that point. So it makes sense for the getter to return an unmodifiable view of the collection.
To summarize:
  • Avoid returning null when you can return an empty collection. Protect the setter of objects that own collections from being passed null references.
  • You can protect a collection by having the owner expose mutators (such as add, addAll, merge, remove, clear, replace, etc) to the client. In which case, the owner should not expose a public setter for the collection and its getter should return an immutable view of the collection.
  • If on the other hand you allow direct access to the collection by exposing a getter that returns the raw collection (as opposed to a view), then you may decide to expose a setter too. But make sure the setter does not allow null collection ref to be passed in.  
  • Document your decision in the javadoc of your getters and setters, so clients know what to expect. E.g. setter rejects null collection arg, or getter returns immutable view.
Well, I've been long-winded enough. Time to say "your feedback is highly welcome."
Additional Resources:
Misc topics on the concept of null, including other ways of dealing with null in Java


p.s. Here's a quote from C.A.R. Hoare on null references in general:
Speaking at a conference in 2009, Hoare apologized for inventing the null reference:[9][10]
I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.

No comments:

Post a Comment