Dispatch dissected

Tuesday August 7, 2012

Dispatch dissected

The more difficult something is to achieve, the more people like it
-- Susan M. Weinschenk

On APIs

In my more cynical moods, I wonder if this explains why there is so much traction behind some Scala libraries. Don't get me wrong, I also use them, and eventually came to appreciate them, but many of them fail at least a couple of rules from Joshua Bloch's API Design by Bumper Stickers talk:

APIs should be self-documenting: It should rarely require documentation to read code written to a good API. In fact, it should rarely require documentation to write it.
Obey the principle of least astonishment: Every method should do the least surprising thing it could, given its name. If a method doesn't do what users think it will, bugs will result.
Names matter: Strive for intelligibility, consistency, and symmetry. Every API is a little language, and people must learn to read and write it. If you get an API right, code will read like prose.

I remember Barbara Liskov saying that APIs should be optimized for readability rather than for compactness. Every Scala programmer should have that printed in capitals right next to his computer.

Anyway, if the API is not so self-documenting, sometimes astonishing and names appear to be chosen randomly, documentation does help; it helps you to get into that ellusive group of people who master the API, and being part of that group - according to the theory of congnitive dissonance - will be sufficiently motivating to plough through the documentation (like reading this article), and tell your mind that it was completely worth the effort.

Use a Java API?

Some people would argue that you might as well use a non-Scala API, but hold on: that violates another principle coined by Josh Bloch:

When in Rome, do as the Romans do: APIs must coexist peacefully with the platform, so do what is customary. It is almost always wrong to transliterate an API from one platform to another.

Dispatch dissected

Dispatch underpinnings

Dispatch is bolted on top of the HTTP Client of Apache's HTTP Components project. That is important to remember, since everything under the hood deals with the raw primitives of that library. It also means that if you would ever consider extending Dispatch yourself, you will quickly run into this library.

HttpExecutor, Request and Handler, the golden braid

From the outside, there are three important abstractions

HttpExecutor

The HttpExecutor is the trait that defines the contract for objects resonsible for sending the requests over the network and passing the responses back to something handling the response, the Handler that I will discuss in a bit.

There are four methods you would have to implement in order to turn the trait into something useful: execute, executeWithCallback, consumeContent and shutdownClient. The trait itself just defines those operations, and expresses a couple of higher order operations in terms of these primitives. In fact, you will almost never call execute, executeWithCallback or consumeContent directly.

The operations that you do call directly are these:

apply[T](hand: Handler[T]): HttpPackage[T]
x[T](hand: Handler[T]): HttpPackage[T]
when[T](chk: Int => Boolean)(hand: Handler[T]): HttpPackage[T]
apply[T](callback: Callback[T]): HttpPackage[T]

More on the mysterious HttpPackage[T] in a minute. For now, it's important to to remember that these four methods are really (in normal circumstances) the methods you use, and they all delegate to execute and executeWithCallback. Let's look at some examples. In all of these examples, assume that executor is a concrete subclass of HttpExecutor.

// Get the content at www.google.com as a string. But only if the
// status code is some value between 200 and 204.
executor(url("www.google.com") as_str)

// Identical, but a little bit more verbose, and not very idiomatic
executor.apply(url("www.google.com") as_str)

// The same, but ignore the status code, which means also handle
// the request in case of a 500 error, etc.
executor.x(url("www.google.com") as_str)

// The same, but setting specific conditions for when the response
// should be handled. (In this case, only if the status code is
// 200
executor.when(_ == 200)(url("www.google.com") as_str)

Handler

All of the methods listed above accept a single type of argument: a socalled Handler.

A Handler[T] is a case class combining the definition of the request with a function acting upon the response (returning a value of type T) and a listener defining what should be done in case of exceptions.

Since it's a case class, its not extended by inheritance. However, there are a bunch of operators that allow you to extend the Handler by specializing either the definition of the request, the function acting upon the response or the exception listener.

Request

As I said, a Handler combines a number of things, including the definition of the request. In fact, normally the Handler is built by calling methods on the Request object, the object defining just the request details.

Under the hood, the Request object carries around all data required by Apache's HTTP client to create the request: the headers to be sent, the host and port number, the path, the query parameters, etc. On top of that it supports the operators to turn this request definition into a fullblown Handler.

Back to the big picture

So, just to clarify the way you normally use Dispatch:

You create Request object using the factory methods available, such as url(...).
You create a Handler from that Request object, by calling one of its operators, in this case as_str, which will grab the contents as a String and return that. (Check the periodic table of dispatch operators for all other operators.)
You pass this to the apply, x or when operation of an appropriate HttpExecutor. One of these implementations is the dispatch.Http object, but there are others, and it's important to understand the differences between them. More on that later.

big picture

Intermezzo: the mysterious HttpPackage[T]

So, what is that mysterious HttpPackage[T] set as the return type of almost all HttpExecutor operations? The answer is easy, but also a little dissapointing: it's undefined. HttpPackage[T] is just a type alias of something that yet has to be defined by subclasses of HttpExecutor. If that sounds weird, then perhaps it helps to consider the different ways in which your client could deal with requests it needs to send.

In some cases, a client might be able to send the request and then forget about the response.
In other cases, a client might be able to send out a request, and then continue doing a bunch of things, checking for the response at some later point in time.
In some cases, there might simply not be anything else left to do. The client should just wait for the results to be returned, then wake up and continue processing the results.

The HttpExecutor trait aims to cater for all of these cases. However, you can imagine that there is a difference between the way the API would ideally look in all of these cases.

Introducting the HttpPackage[T] type alias, allows Dispatch to specialize the return type based on the specific subtype of the HttpExecutor. The HttpExecutor implementation for blocking calls defines HttpPackage[T] to be just T. A thread safe implementation of HttpExecutor defines HttpPackage[T] to be a Future[T]. That leaves the clients of that HttpExecutor the choice to continue work on other things or to block for the results to arrive.

HttpExecutor Class Hierarchy

For educational purposes, let's take a look at the geneology of the HttpExecutor family:

class diagram

BlockingCallback: I mentioned the executeWithCallback method before, and considered it one of the methods that you normally would not call directly. In fact, it's questionable if your calls you ever hit this method, in every day use of HTTP. That method accepts a Callback implementation that will not only asynchronously fire the request, but also incrementally handle the results. (That is, if some HTTP content is getting passed in, it will call your Callback implementation to handle it. Instead of waiting for the entire response to have arrived, it process chunks of data of the response once whenever these chunks become available.
BlockingHttp is the subtype of HttpExecutor from which almost all other traits inherit. It will execute the request using a non-threadsafe Http client, and block for the results of the handler to become available.
dispatch.Http is a concrete subclass of BlockingHttp. BlockingHttp itself is abstract, and its definition of HttpPackage[T] is still undefined. dispatch.Http turns it into something that is no longer abstract, by defining HttpPackage[T] to mean simply T. If that sounds a little too abstract for your taste, then simply remember that you can create instance of this class, and then have an executor that will act in a blocking way, without being threadsafe.
Safety is the trait that allows you to mix in thread safety into subtypes of BlockingHttp.
dispatch.Http is also an object, extending the class dispatch.Http, extending it with Safety. That means you have something that you could use throughout your entire application, without running into too much trouble, unless the number of threads exceeds the number of threads defined by Safety.
Future is a subtype of Safety, that extends threadsafety with asynchronicity. Effectively means that subclasses of BlockingHttp extended with Future will have apply, when and x operations that return a Future, instead of the result of the function defined by the Handler. In other words, in case of the Http object, calling Http(url(...) as_str) would normally return a String. However, if you - instead of using Http - would use an instance of dispatch.thread.Http, then that same call would return a Future[String] immediately.
dispatch.thread.Http is a concrete subclass of BlockingHttp and Safety, offering the behaviour I outlined a second ago.

So where does this lead us?

Hopefully, this will give you some clues how to extend Dispatch to mix in the behaviour you want. At Udini, we needed a couple of things in addition to what Dispatch normally provides:

A retry policy: sometimes, the remote services we call fail. In those cases, we want to retry the call. We implemented that by having a new trait called RetryPolicy that extends dispatch.Http, overriding the execute operation with something that has the retry behaviour. (You cannot extend apply, x or when, since it's final. And even if it would not have been final, it would still have implied repeating code in all of these methods. By sticking it into execute, it works for all of these operations.)
Failover: in some cases, for some services we have information on replicas being available. (This mostly applies to services within our own EC2 environment, in which we don't have an internal load balancer offering failover capabilities.) In those cases, we need to implement failover using replica awareness inside the client, however, we want to do it in a DRY way. Adding another trait called ReplicaAwareness allowed us to do just that.

I have to say that in that particular case, rather than implementing execute in an alternative way, we decided to change the client and have replica awareness inside a DefaultHttpClient subclass returned by the make_client call on BlockingHttp subtypes. That doesn't mean it couldn't or shouldn't have been done differently; it just reflects our understanding of Dispatch at that time.

Detailed error logging: the default implementation of the HttpExecutor throws StatusCode exceptions in case the status code is different than what your calls expect. Unfortunately, these StatusCode exceptions don't carry a lot of detailed information on the original request, which means our logging missed on important details. By implementing another trait DetailedStatusCodeReporting and overide the definition of the execute operation, we were able to replace the StatusCode exception with a more detailed version of that exception, carrying information on the request that failed.

In summary, we could eventually implement all of the additional executor behaviour we needed by having a trait override the definition of HttpExecutor's execute operation.

Let me know if you found this explanation useful. Some of what I wrote here will end up in a new edition of the little Dispatch book I wrote. Let me know what you're missing, and I will stick that in as well.

(Expect a more detailed explanation of the retry policy, failover and detailed error logging traits in some future blog posts.)