Retry

There is always a second chance

Say what?

With HTTP, if you don't get a response, it's impossible to tell if something happened on the server, or not. It might have processed the request, and then fail to inform you about it, or the entire request might have actually never reached it's actual destination. There is no way of telling what happened.

Reliable messaging protocols typically include a retry mechanism, to guarantee the message has been processed at least once. (And then typically also include mechanisms to guarantee that the message was processed at most once and to preserve message ordering. That's out of scope for this post though.)

Ideally, you design your endpoints in such a way that they're able to handle the same message multiple times. In that case, if something fails, and you don't know in which state you ended up, you can always try again.

In our current project, we have a lot of third-party APIs failing quite often. So we have learned by now that we need to be aware of that, and that in some cases retrying is better then throwing the failure back in the face of our users.

However, we call a lot of things all over the place. If wherever we set up a call to another service we would have to add the code to retry it, it would muddy the waters. So we ended up building retry capabilities into our client.

A word of caution

Before you read on, you need to understand that what we did is not always the way to go. You can only retry an HTTP request if the receiving endpoint should not have any side effects, or it should be idempotent. If you cannot guarantee that, then it would not be wise to do what I'm going to show you below. (Like, if the request you're sending is increasing the money in a bank account with $20,-, you don't want to call that multiple times - unless you are the owner of that bank account.)

RetryPolicy

We are using the Scala's dispatch library for making HTTP requests. The old version, so this post will be dated soon ;-), which is why I figured I'd put it out here now, before it's irrelevant.

trait RetryPolicy extends Http with Logging {

  val maxAttempts = 3
  val retryCodes = List(
    500, // Internal server error; might succeed the next time.
    502, // Bad gateway, invalid response from upstream server, might succeed next time.
    503 // Service unavailable. It might be available at some later stage.
  )

  /**
   * Checks if the response code is in a certain range, and if it 
   * isn't executes the request a number of times. No
   * exponential back-off for blocking calls.
   */
  override def execute[T](host: HttpHost,
                          credsopt: Option[Credentials],
                          req: HttpRequestBase,
                          block: (HttpResponse) => T,
                          listener: dispatch.ExceptionListener) = {

    // Not tail recursive, but not necessarily a problem because 
    // of the number of attempts. Making this a huge number
    // would be a bad idea anyway, with a blocking call.

    def attempt(attemptsLeft: Int)(resp: HttpResponse): T = {
      if (retryCodes.contains(resp.getStatusLine.getStatusCode)) {
        if (attemptsLeft > 0) {
          warn("Failed to call %s after %s (of max %s) attempt(s); retrying".format(
            req.getRequestLine, maxAttempts - attemptsLeft, maxAttempts
          ))
          consumeContent(Option(resp.getEntity))
          super.execute[T](
            host = host,
            credsopt = credsopt,
            req = req,
            listener = listener,
            block = attempt(attemptsLeft - 1)
          )
        } else {
          //no more attempts left to try
          //just return the block itself
          warn("Giving up calling %s after %s (of max %s) attempt(s)".format(
            req.getRequestLine, maxAttempts - attemptsLeft, maxAttempts
          ))
          block(resp)
        }
      } else {
        //no problem at all
        block(resp)
      }
    }
    super.execute(
      host = host,
      credsopt = credsopt,
      req = req,
      listener = listener,
      block = attempt(maxAttempts - 1)
    )
  }

}

How do I use it?

Using it is pretty easy. In fact, you don't have to worry about its existence at all. You just mix it in with another Http implementation, and bob's your uncle.

val http = new Http with RetryPolicy
http(url(…).as_str) // Will retry 3 times at max

Limitations

In our case, this global policy mixed in with an Http implementation works because we know are requests are idempotent. That's one of the reasons it works so well.

It also works well since in all of our cases, we actually need blocking requests. We cannot afford to have the request be handled asynchronously. If in your situation, you can actually afford to have the requests be sent asynchronously, then this is not the class you need. (And talking about Dispatch, you wouldn't even be able to use it, since it extends dispatch.Http which is blocking.)