At SUMO Heavy, we work with clients that have managed infrastructure and those who manage their own infrastructure. We’ve been lucky enough to work with companies small enough to having a single server and as large as needing dozens or hundreds AWS accounts spread across multiple organizations. Through the years, we’ve learned a number of lessons in what works, what doesn’t, and how the methods used by some of the largest companies to save time and money can apply to everyone.

What is infrastructure automation?

Infrastructure automation is defined by RedHat as the use of software to create repeatable instructions and processes to replace or reduce human interaction with IT systems. What it comes down to is implementing a tool that can manage your IT infrastructure for you via processes that reduce the number of clicks necessary to perform a task.

An example:

Let’s focus on a simple one – you want to spin up a WordPress site in an AWS account for a client. There’s a lot to think about here in terms of infrastructure. You might need something along the lines of the following:

  • ACM certificate
  • ALB, target group, and listeners
  • CloudFront distribution
  • CloudWatch log group and stream
  • ECR repository and lifecycle policy
  • ECS service
  • ECS task definition
  • IAM roles and policies
  • Internet gateway
  • Public and private subnets
  • RDS cluster
  • Route table
  • Route53 host and records
  • S3 bucket
  • Security groups
  • VPC

Note: Internally, we use ECS pretty heavily. While Kubernetes is growing wildly in popularity and many of our clients are shifting to it, we find that ECS is a great entry point for SMBs that are new to containerization and infrastructure automation. The portability of container technology allows you to upgrade to other container orchestration systems at a later time.

Without infrastructure automation, everything regarding the setup of these needs to be done in the AWS console. Gross. Depending on the complexity of the site, this could take a good chunk of your day the first time setting it up, and that time never decreases. There will only be a minimum amount of time you can get down to when spinning up a new site.

What can I do?

This is where infrastructure automation comes into play. There are various tools that can help with this. Our preferred tool is Terraform, but there are many others. There are a few important goals to achieve when automating infrastructure:

  • Build infrastructure with code
  • Integrate infrastructure management into your CI/CD pipeline

Our first recommendation is to automate your core infrastructure that you want only your TechOps team to make changes to. If this infrastructure already exists, some tools will allow you to import their metadata. For example, Terraform uses state files which as the name implies – store the state of your infrastructure. At SUMO Heavy, our core infrastructure has our VPCs, subnets, gateways, ECS clusters – the aspects of our infrastructure that will rarely change.

Our second recommendation is to integrate your infrastructure automation into your application deployment pipelines. This will take some upfront time to plan out, but will be immediately replicable thereafter.

Note: For the sake of this particular article, we’re keeping it high level. The nuances of orchestration vs. configuration are reserved for another time.

How can I do that?

DevOps. You could ask five different leads what DevOps is and get five different answers, so for our purposes let’s keep it simple. DevOps shortens the cycle time of feature development and operational work by giving engineering and TechOps teams more autonomy. What we’ve learned from working directly with clients in their engineering environments is that there are so many quality tools that are successful both inside and outside the Enterprise.

This is exciting when you have hundreds of applications across dozens of engineering teams, but who says it can’t work for a small company? Time savings is time savings — right?

Story Time

Let’s start with a realistic example. You work at a company that has 25 WordPress websites that live on one server. For a variety of reasons (security, scalability, budget), you are tasked with splitting those sites out into a more robust cloud infrastructure where each site will be able to scale separately from the others.

That’s a lot of work. You’re not moving a site, you’re moving 25. And you’re not moving them to any of those popular WordPress hosts; your company needs control over that. What do you do? This is the complexity that practicing DevOps solves.

We’re going to focus on the infrastructure first. The idea here is to create your infrastructure with code (configuration files, really). You can then run your orchestration tool of choice against this code and it will create all your infrastructure for you. The state of all that infrastructure is tracked, so when you run it again, only the changes will be implemented. If you have additional infrastructure, such as a VPC, you can even import that existing infrastructure.

Let’s break this into two steps, your core infrastructure which is the stuff only the TechOps team should touch, and your application infrastructure.

First we’ll need our core infrastructure. This will be the stuff that doesn’t change very often — your VPC, certificates, internet gateways, core policies, etc. While it won’t change very often, we want it to be easy when we do, so we’ll implement the following scenario:

  1. TechOps engineers push the code that handles the core infrastructure to a Git repository
  2. The push triggers a CI/CD pipeline to run
  3. The pipeline runs your infrastructure automation to authenticate with your cloud and create the infrastructure
  4. The infrastructure automation tool stores the state of all that new infrastructure in a state file somewhere remote, such as S3

Now we have a VPC running with all of the core fundamentals we need to run our applications, so now we need to get an application deployed. This is a similar process, but instead of having code in its own repo, it’s going right alongside your application, and will include things such as:

  • Container registry, services, task definitions
  • IAM roles and policies
  • Route53 records
  • Security groups
  • CloudFront distributions
  • CloudWatch log groups and streams
  • Load balancers
  • RDS databases

The goal is to have a tidy configuration that creates the application-specific infrastructure within your core infrastructure, and the process looks like this:

  1. Application engineers push application code changes to the Git repository
  2. The push triggers a CI/CD pipeline to run
  3. The pipeline first checks if infrastructural changes are required, then runs them if true
  4. The pipeline creates a Docker image with your application bundle and pushes it to the registry
  5. The pipeline publishes the application

What used to be a server is now built automatically with no clicks in your cloud console, and uses container orchestration to deploy the application. The part that’s especially magical is — as you mature this process and apply roles to your applications, engineers can create exactly the infrastructure they need without requesting it from TechOps and them having to manually build everything. Can I have an S3 bucket? I lost my access and secret keys. Why can’t I SSH into my logs? Those days will be fewer and fewer as time goes on.

What’s next?

This is where it gets fun. With your infrastructure automated, you can now focus on the important, repetitive tasks. Let’s get into two more real-world scenarios before we wrap up.

One goal to bear in mind is that we should always be striving for more ease. The less effort it takes to deploy an application, the better. You notice that after 15 sites in, your application infrastructure code looks similar across all the WordPress sites. So now if you want to make a change to each application, you need to update it. This is where modularization can come in. Some systems, such as Terraform, allow you to create modules. Here, you would create a “WordPress” module that accepts a few variables. This shrinks your application infrastructure code down to a few lines — import the core infrastructure, import the WordPress module, assign some variables, and you’re done. Then, if tweaks need to be made to all of the WordPress sites, you can just update the module and re-deploy all of the sites.

The next is service implementation automation. One day, an update comes in from your CTO that you have budgetary approval to deploy New Relic across all your apps. That’s AWESOME! The thing is, every app will need the agent installed. That means scheduling it across all the teams, getting them to implement it, deploy it, fix any issues. Let’s make that simpler. Since you’re only focused on PHP, you will create your own base PHP Docker image, based on the official one, but with New Relic pre-installed. Then, all your application engineers need to do is switch to that image, set a few environment variables, and boom! Everyone has New Relic.

Summary

One of our overarching goals at SUMO Heavy is to keep a pulse on what is being used in the enterprise tech landscape and find opportunities to apply the same time and money saving techniques to smaller companies like ourselves. We strive to find new efficiencies every day, and to never settle in and become stale.

Photo by Jared Murray on Unsplash