Sunday, July 27, 2014

Guava Anyone?


Andrew Sy (162 karma) posted on Apr 27, 2012
Guava is a library from Google that (among other things) supports manipulating collections in a functional-like manner (reminiscent of languages like Ruby, Python, Groovy, Scala). This leads to code that is more concise, "fluent", easier to understand (easier to maintain and fewer errors).  
It could, for example, help you initialize a Set in your unit test like this:
Set<Integer> mySet = ImmutableSet.of(1, 2, 3, 4, 5, 6, 7);
Or filter a set of strings like this:
//filter for strings starting with "J"
final Set<String> strings = ...
final Collection<String> filteredStrings =
          Collections2.filter(strings, Predicates.containsPattern("^J"));
Or transform a collection like this:
//convert strings to upper case
final Set<String> strings = ...
final Collection<String> transformedStrings =  
          Collections2.transform(strings, new UpperCaseFunction<String, String>());
* The above examples were stolen from Code Munchies and Inspired by Actual Events.
Look Ma, no loops The Guava library loops through collections for you, so you can avoid explicit looping constructs where it tends to clutter your code (Don't go overboard though. Read this caveat).
Here's a demonstration of how Guava can help us, using an example taken from our code base. The following very common pattern from our code base. We use it wherever we have to cut a big batch into smaller chunks for processing (there are at least 4 or 5 places in our code where we use this pattern). Notice the huge code duplication, and not to mention the difficulty in reading it.
// We want to lookup the data in batches - the maximum size of the batch is important here, because there are limits on how large this can be.
for (Device d : devices)
{
   count++;
   batch.add(d.getDeviceId());
   if (count % maxInClauseCount == 0)
   {
       bundles = namedParameterJdbcTemplate.query(queries.get("SQL_QUERY_AUDIENCE_MULTIPLE_DEVICES"), Collections.singletonMap("mmids", batch), deviceAudienceValueExtractor);
       for (DeviceAudienceBundle bundle : bundles)
       {
           List<AudienceValue> audiencesForDevice = map.get(bundle.getDevice());
           if (audiencesForDevice == null)
           {
               audiencesForDevice = new ArrayList<AudienceValue>();
               map.put(bundle.getDevice(), audiencesForDevice);
           }
           audiencesForDevice.add(bundle.getAudience());
       }
       batch.clear();
   }
}
if (batch.size() > 0)
{
   bundles = namedParameterJdbcTemplate.query(queries.get("SQL_QUERY_AUDIENCE_MULTIPLE_DEVICES"), Collections.singletonMap("mmids", batch), deviceAudienceValueExtractor);
   for (DeviceAudienceBundle bundle : bundles)
   {
       List<AudienceValue> audiencesForDevice = map.get(bundle.getDevice());
       if (audiencesForDevice == null)
       {
           audiencesForDevice = new ArrayList<AudienceValue>();
           map.put(bundle.getDevice(), audiencesForDevice);
       }
       audiencesForDevice.add(bundle.getAudience());
   }
}
Compare that to the following code which leverages guava. So much more intuitive IMHO, and no code dup ! Also less error prone, because no need for bookkeeping code like keeping track of "count" and doing batch.add() and batch.clear().
Iterable<List<Device>> deviceBatches = Iterables.partition(devices, batchSize);
for (List<Device> deviceBatch : deviceBatches)
{
    .. Process one batch ..
}
* see javadoc for Iterables#partition
Performance-wise, Google Collections have some pretty good bench-marks. For the most part, the library avoids making copies of data structs but instead creates "views" of the backing data struct, which is one way performance overhead is kept low.
Actually, we could have easily rolled something like the Guava Iterables#partition method on our own (wonder why we didn't do that earlier :) But then again, why do that when Guava already provides a cohesive and coherent library with Predicates, Functions, Filters, and other facilities. Not to mention other gems like new data structures such as tables, multimaps and bidirectional maps, special hashes such as bloom filter, etc etc.
Other articles that demonstrate the power of guava are:
- Collection of great Guava articles http://www.tfnico.com/presentations/google-guava
- nice example of using Multimap to do "group by" http://i-proving.ca/space/Ken+Stevens/blog/2011-02-17_1?showComments=true
Go ahead and give Guava a try. I won't be surprised if you start falling in love with it.

No comments:

Post a Comment