Scripting ElasticSearch

Friday June 20, 2014

Scripting ElasticSearch

On scripting ElasticSearch using CoffeeScript, elasticsearch.js and Node

Quickly throwing some thoughts together on using ElasticSearch with CoffeeScript. First of all, the good stuff: ElasticSearch has a great REST API. That means you can use any HTTP client to get stuff done. Nice!

But, really: any HTTP client? I mean, all the books have examples that use CURL. Is that really the best we can get?

If HTTP is the uniform interface, then CURL seems to be the uniform client. And of course CURL is fine, but it's not:

the most compact notation ever, nor
the most fluent notation ever, nor
the most expressive notation ever, nor
supporting JSON as a first class literal, nor
offering any control flow (first this, than that, but only if, etc.)

… and since it doesn't support any control flow, variables and all of that, you need a scripting language for it, which seems to default to BASH -- not really the greatest scripting language ever.

RESULT=$(curl -X PUT http://localhost:9200/foo)
# Hmmm, so how do I check if the outcome is {"acknowledged":true}
if [RESULT …]

Node and JavaScript to the rescue?

It gets better, even though it's not great yet:

Not all that compact
Not all that fluent
Not all that expressive
But it does support JSON as a first class literal
And it does support control flow

Using http

One of the things you could do is use the http module. I'm not going to write down the entire call, but it certainly is not any more compact than what I showed before using CURL. And this is just scratching the surface. For scripting, this doesn't make any sense.

var http = require('http');
http.request({
    hostname: 'localhost',
    port: 9200,
    path: '/foo',
    method: 'PUT'
}, function(res) {
    res.on('data', function(chunk) {
       …
    });
});

Using request

You could also use the request module, which is definitely a lot easier to use:

var request = require('request');
request({
  url: 'http://localhost:9200/foo',
  method: 'PUT',
  json: true
}, function(error, response, body) {
  if (!error && response.statusCode == 200) {
    var json = body;
    …
  }
});

Now we're getting somewhere, but still the noise vs. signal ratio is out of whack, if you'd ask me.

Pyramid of doom

Everything in node is based on callbacks, and for good reasons. However, it doesn't really make your code any easier on the eyes. If this is what you want:

get water cooking
add rice
boil it until it's ready
eat it

Then this is the way you need to express it in node:

range.on("preheat", function() {
    pot.on("boil", function() {
        rice.on("cooked", function() {
            dinner.serve(rice);
        });
    });
});

This is the problem affectionally referred to as 'the Pyramid of doom' in this article (aka Christmas tree syndrome). If you would be scripting ElasticSearch with the request module, then you would be running into exactly the same problem: you can only start adding things to an index when the index is there, and you can only query the index when your data has been indexed. As a consequence, your code would look like this:

request(…, function(…) {
   request(…, function(…) {
     request(…, function(…) {
   }
}

… which clearly doesn't add to making your code any more readable. The good news is, it can be solved, if you're switching to a promise based API. In case of a promise based API, the same code would look like this:

request(…)
   .then(function(…) { request(…) })
   .then(function(…) { request(…) })
   .then(function(…) { request(…) })
   …

Elasticsearch.js

The good news is, there is an API that reduces the boilerplate for making requests to ElasticSearch significantly and supports a promise based programming model: it's the JavaScript client library from ElasticSearch itself: elasticsearch.js.

This is what adding an index would look like using elasticsearch.js:

var es = require('elasticsearch');
var client = new es.Client({
  host: 'localhost:9200'
});

client.indices.create({
  index: 'foo',
  …
}).then(function(data) {
  if (data && data.acknowledged) {
    …
  }
}).then(function() {
  process.exit();
});

Now this is already a lot easier on the eyes. But it still isn't perfect:

Lots of braces
Lots of keywords hiding the intent

CoffeeScript to the rescue

If you already got this far:

You npm installed elasticsearch.js
You have node running
You have your package.json file pointing to index.js

Then it will be really easy to move just one step further: point your package.json file to index.coffee instead, and rewrite it like this:

es = require 'elasticsearch'

client = new es.Client
  host: 'localhost:9200'

client.indices.create
  index: 'foo'
  …

.then (data) ->
  if data?.acknowledged
    …

.then -> 
  process.exit()

Below I show both the JavaScript and the CoffeeScript versions side by side. If you're not familiar with CoffeeScript, then perhaps the notation is a little bewildering. But I reckon that even for the untrained eye, the CoffeeScript seems easier to grasp. At least I think it does. And therefore, this is my prefferred ElasticSearch scripting solution.

JavaScript CoffeeScript

var es = require('elasticsearch');
var client = new es.Client({
    host: 'localhost:9200'
});
   
client.indices.create({
   index: 'foo',
   …
}).then(function(data) {
  if (data && data.acknowledged) {
    …
  }
}).then(function() {
  process.exit();
});

es = require 'elasticsearch'
  
client = new es.Client
  host: 'localhost:9200'
  
client.indices.create
  index: 'foo'
  …
  
.then (data) ->
  if data and data.acknowledged
    …
  
.then -> 
  process.exit()

I think this solution addresses all the problems I had with CURL and BASH:

it's compact
it's pretty fluent
it's quite expressive
it supports json natively (and solves the problem of having braces all over)
it supports the control flow I need