Implenting Everything as Code

I started a new role around a year a go at a FinTech startup which has afforded me a level of freedom to implement DevOps practices in a way that wasn’t possible in my last job. One of the first things I wanted to implement was a way of managing our infrastructure as code.

I’d like to quickly run through how I achieved this with Terraform, Git and Jenkins. Terraform I’ve looked at before, it’s a product from hashicorp that provides a common language for configuration of infrastructure resources. This makes it perfect for managing cloud deployments.It also makes managing that code base extremely sensitive as incorrect or conflicting changes can cause huge disruption an even irretrievable data loss to production systems.

I don’t want to delve too much into using Terraform but instead focus on how I created processes through which our organisation can manage the risks associated with an Infrastructure as Code tool. My answer was to apply DevOps practices to the process with a Continuous Delivery pipeline. We use Git for source code management and Jenkins for our Continuous Integration server.

Processes and Safety Nets

The process for updating our infrastructure we now use is when someone wants to make a change they open a merge request with their changes. This is then reviewed, tested locally with Terraform Plan and then merged. Once merged a Jenkins Pipeline is triggered that installs Terraform, runs a plan before pausing at a deploy gate. This allows a final review before the change is applied by the Jenkins server.

You can configure a deploy gate in your pipeline using a snippet like this:

script {
                  timeout(time: 10, unit: 'MINUTES') {
                    input(id: "Deploy Gate", message: "Deploy $ENVIRONMENT?", ok: 'Deploy')
                  }

If everything still looks good, you hit the big shiny deploy button as your plan output is piped into a Terraform apply and your changes are applied.

This process gives us plenty of time to spot dangerous changes before they are applied, this works largely but does rely on users to follow the process which thankfully the people I have worked with have done so.

Did it Work?

Using this process we’ve managed to date 351 deployment attempts since we started using Terraform to manage our environments in August 2018 with a 60% success rate. There are failures but because every change is reviewed we have (mostly) avoided accidental deletion of in use environments.

Things sometimes do go wrong but fixing them has invariably just required another commit and another run of the deployment pipeline. As automated testing for infrastructure code gets better I can only see this process getting more reliable.

As it is the pipeline has been a remarkably stable solution and I strongly recommend it if you are considering using a tool like Terraform to manage your cloud infrastructure.

I’d also like to thank @TimBerry whose medium post helped me get this up and running last year.


Comments

comments powered by Disqus