lixo.org

When Deployments Disappear

I’m building a few micro-services in Ruby (Sinatra) and JavaScript (Node.js) to scrape and reformat a few internal data sources at ThoughtWorks. On top of that, an AngularJS application aggregates that data and presents interesting visualizations. It’s a very nice way to separate where the data comes from and what’s done to it, and Angular makes it fairly straightforward to keep the UI tidy and easy to change.

At one point, I needed two kinds of views on the ThoughtWorks people directory. One was a simple search-as-you-type box, and the other a more complete view with a ton of contact details for each of my colleagues. Eventually, there’d be a third view, with only the basic details, to be presented as a pop-up balloon when you hover over somebody’s name or picture.

It made sense to return two (or maybe three) different kinds of JSON from the services, so the search-as-you-type box could be as fast use as little bandwidth as possible. This is what I ended up with:

{ "name": "Carlos Villela", id: "cvillela", "aliases": ["cv"] }

Simple, and enough for that search-as-you-type box. I built a service that ferried the queries over to our LDAP servers (thanks to ruby-ldap) and built the JSON response. Performance wasn’t so great, but I could live with it for a while. Time to deploy it!

Well, not so fast – and if you were wondering, here’s where the yak shaving starts. Deploy what? Where? Who would keep it all running smoothly? How would upgrades, security patches and updates be handled?

I could ask for a virtual machine from our operations guys, slap a Linux distribution in there and get started on some Puppet, Chef or Ansible scripts. But I didn’t want to have to maintain that stuff as well as my code. I definitely see the value in automating the setup and configuration of servers, but I was feeling lazier than usual: I wanted something that went “oh, I see you have a Sinatra application and a Gemfile in this git repository. I have a machine that has everything you need installed; let me run the app for you!”, kind of like what Heroku does when you push to an application repository for the first time.

I couldn’t use Heroku, unfortunately (most of those data sources are only accessible from the ThoughtWorks network), but Dokku did the trick quite well. It’s based on Docker, which does most of the heavy lifting, and a bit of bash glue between gitreceive and the Heroku buildpacks. It allows for exactly the same “run git push heroku master and everything else is taken care of” kind of workflow I was looking for, and it’s a breeze to install.

In about a day or so I had a CoreOS machine running Dokku and serving up two different applications: www, with the AngularJS application, styles and templates, and addressbook with the Sinatra code for LDAP queries. Neat!

Later on, I wanted to build the more complete JSON for the detail view, which I knew I’d need to get from a different data source (LDAP only carries the basics). It would consume the LDAP service too, but then augment it with more contact information. A third application, called contactdetails was pushed and Dokku took care of deploying it. As I was iterating over the format of the JSON responses, I noticed the performance issues of the addressbook application were getting in the way of testing.

Here’s where everything clicked: I could build a mock addressbook and deploy that without touching the original application, in whatever technology stack I wanted, and run it alongside everything else, by simply changing where the contactdetails application pointed to!

$ git remote add mock git@server:addressbook-mock
$ git checkout -b mock
$ rm -rf *
$ curl http://addressbook.server/index.json -o index.json
$ curl http://addressbook.server/cvillela.json -o cvillela.json
$ git add -A
$ git commit -am 'creating mock service (only index and cvillela supported)'
$ git push mock mock

I now had addressbook-mock to play with, with blazing fast responses, and I could tweak individual JSON responses if I felt like it. I could have as many versions of the application running as I wanted: all I needed to do was to find a suitable subdomain for them.

After a while, I had a handful different deployments of the addressbook repository. I needed one for blazingly fast and stable responses to test the contactdetails app. I built another to test reliability and another to ensure timeouts were working well. I don’t even remember what addressbook-broken-json-array was for, but it lived there for a brief period of time.

Eventually, navigating breaking changes to integration points became obvious: if an application relied on an older version of the wire protocol, I could point it to the last version before the breakage (deployed to addressbook-simple-aliases or addressbook-v3, for example), leading to a fairly friction-less upgrade cadence.

In my case, the source data is in a stable format and being held in an external system. That deployment model would break down slightly if I had to deal with applications that owned and kept their data around, as database migrations would easily get in the way of running multiple versions of an application in parallel safely. In a development environment, it’d be enough for the database to sit inside the container (which would effectively isolate each deployment’s database, maybe a nice feature).

Another thing I noticed is that, when deploying multiple versions of an application became as easy as firing them up, I started seeing less value in tests that stub out over-the-network interactions, and made the endpoints behave differently depending on what I wanted to test, instead. That made some of my tests run slower, but I get more confidence that the integration is working well without breaking out a sweat.