Robert McWhirter – 15 December 2015

How we auto scale our digital tools and applications with AWS

We caught up with our Infrastructure specialist Steve Jack, about his latest work: making our flagship application Twine’s infrastructure consistent and quick to scale. He talks about how he’s used AWS CloudFormation, Packer, Docker and Jenkins to overhaul the infrastructure behind Twine. Here’s a summary of what he had to say:

New infrastructure and build pipeline

Over the last month or so, we’ve been setting up the new infrastructure and build pipeline for Twine, with the main goal of allowing it to automatically scale based on load.

Previously, Twine was a fixed size, so scaling based on load was quite difficult. You had to bring another server up and make sure that it is in exactly the same state as the rest. This can have obvious problems: if a lot of people from one company log in to Twine and, for whatever reason, are all trying to put their expenses in at the same time, we don’t want that to affect all of our other clients.

Scaling based on demand stops that from being a problem.

Setting up separate AWS accounts

We went for a model of splitting up the different environments into different accounts (separation of concerns). Rather than having one big account, with the potential of access to development and production for anybody that gets hold of credentials. It’s now separated into four accounts:

Dev account (for integration and test environments).
Prod account (for production traffic).
SSH Bastion account (this account has one server in it that’s used to connect to the dev and production boxes over a VPC Peer setup between the Dev/Prod accounts and the SSH Bastion account).*
DNS account (this holds the main HostedZone for the twinehr.com domain name).

*Important reason for the SSH account: it’s the only place you can access the dev/prod boxes. This minimises the risks of access to the boxes and allows the team to control access to the servers from a single location.

Setting up the infrastructure

The bulk of the work has been setting up the infrastructure for the application and all the parts around it (Elasticsearch and other databases).

The main focus of this was making sure that everything is repeatable: using templates is really important as we scale up. For this reason, we wanted to get everything set up on AWS CloudFormation. CloudFormation gives you a blueprint of your infrastructure so you can replicate it really quickly.

The final part was working out how we could create the application so it could run on the servers. That involved things like Jenkins and Docker, which I’ll go into a bit more below.

Changing how developers work on the application

This involved changing the virtual machine to use the same structure that the servers use in AWS to make them exactly the same.

If this isn’t consistent, you end up developing on a different structure in development to the structure in production. This can mean you’ll find that something works great in development but then breaks in production. This is obviously frustrating and time-consuming

We made sure that the development box is built in exactly the same way as the box for AWS (production). This way you can be pretty confident that if it’s working on development, it will work on production.

A lot of these steps are just about removing unknowns and trying to keep things similar. But it all has a big impact on how Twine performs:

The deployment time and stability of Twine is improved.
Now it’s a single application on a server, Twine is much faster.
It’s now better at handling big load spikes

And for users this means:

Less chance of downtime
The peace of mind that data is secure
Quicker and more regular updates for end users
And, importantly: the service just works

Tools Used

A few tools came in very useful in the process:

Packer

We used Packer to build the Amazon Machine Images (AMIs). This is how we can auto-scale. For auto-scaling the work you really want a server to startup and just have everything available, you don’t want to have to start the server up and remotely connect to it and install the latest version of Twine, warm the caches and all that kind of stuff. You just want the box to come up and be available. So that’s what we used Packer for: to create this AMI that has everything available.

We also used Packer to build the virtual machine for development, so this is where the two are in synch: we used the same software to build the servers to Amazon as we used to build the virtual machine for development. So they’re based on the same packages.