Immutable Infrastructure: Networks

If you work with network infrastructure, you know that it has a tendency to grow warts, that is, it drifts from its original configuration. One of our goals in building Fugue as the operating system (OS) for the cloud and a single source of truth and trust for your infrastructure is to prevent this drift from occurring by maintaining your infrastructure's known good status. After all, "a trusted system only does what its author intends."

Previously, we've focused on the "warts" grown by compute instances, but this problem is present in other infrastructure components, such as networks. Configuration drift in networks often occurs when manual intervention is involved to deploy and maintain them. I have seen network configurations that take up hundreds of rows in spreadsheets and are deployed by hand, line by line. When the application isn’t functioning properly, direct manual intervention is usually the go-to solution to open up ports and IP address range.

This often means the new configuration is not up-to-date in the original spreadsheet. Temporary fixes then become permanent because they are forgotten. When a new service is added to the application, more manual intervention is required. This manual intervention not only exposes applications to possible security risks—through inadvertently opening up security group rules—but can cause significant application downtime when rules or routes are mistakenly deleted.

The challenge of configuring networks has been partially solved through DevOps tools and services like AWS CloudFormation, which replaces manual implementation with infrastructure automation. In this world, network configurations can live in a controlled domain specific language (DSL) or template that is repeatable, auditable, and testable.

However, many of these tools fall short. They simply deploy the network configurations but do not enforce intended state beyond the initial deployment. This becomes apparent when a manual change is imparted on the network after it was deployed using these tools. In the worst case, the changes are left as-is. In the best case, only the rules that were originally deployed are enforced.

Fugue brings automation and control of networks one step further. Fugue strives to always bring networks back to the intended state when they were deployed. If rules originally defined in a composition are deleted, Fugue will restore them. If new rules find their way into your network that were not defined in the composition, Fugue will remove them.

To illustrate, let’s take a look at a short but powerful example involving AWS security groups:

security-group web-elb-sg:
    description: 'Allow port 80 from the world'
    ingress-rules: { 80: all-tcp }
    egress-rules:  { 0-65535: all-tcp }

security-group web-server-sg:
    description: 'Allow port 80 from the web-elb-sg Security Group'
    ingress-rules:  { 80: allow-from-web-elb }
    egress-rules:   { 0-65535: all-tcp }

rule allow-from-web-elb:
    ip-protocol: 'tcp'
    groups:      [ web-elb-sg ]

In the Ludwig code above, we have defined the web-server-sg security group, which allows ingress port 80 traffic that has originated from the web-elb-sg. When this code is compiled and run, the Fugue Conductor will make sure the security groups and rules that you have defined in your composition are deployed in your AWS account.

But the Conductor does not stop there. As with EC2 instances, the Conductor periodically checks (currently every 10 seconds) your running composition against what was originally defined and compiled. If the network infrastructure fails to match what the Conductor sees in the composition, it will make the requisite changes in your infrastructure to maintain fidelity with the original.

Let’s put Fugue to the test by growing some warts in our network. To do this, I will play the well-meaning but green SysAdmin who wanted to take a look at some server logs while the web servers were running (we assume he wasn’t aware of the log management system that was just implemented).

Logging into the AWS management console, I will add a security group rule that allows SSH from 0.0.0.0/0. As you can see in the video below, within seconds of adding the rule, the rule is removed, returning the security group to its original configuration.

Adds Security Group Rule

Now, let's look at it from another angle. Let's say that it is Friday afternoon, and our well-meaning SysAdmin wanted to check the security rules in the AWS account before he left for the weekend. In this case, we will simulate a scenario where he inadvertently deletes the security group rule that allows Port 80 ingress traffic to their web servers. Unaware of what he has done, he promptly shuts his laptop and heads to the pub.

In typical situations, this scenario could cause a significant site outage. If we take a look, we see that in this scenario, the site would only suffer up to 10 seconds of downtime. The Conductor does its check and quickly re-creates the security group rule, just as it was defined in the Ludwig composition.

Removes Security Group Rule

This doesn’t only apply to security groups and rules. The Conductor will also maintain the fidelity of other network configurations such as VPC routes, network ACLs, internet gateways, and subnets, to name a few.

But What if I Want to Change Something in my Composition?

Making changes to a running composition is simple. Let's say you want to add a rule to the ELB security group that allows port 443 from the internet. First, modify your Ludwig composition to reflect those changes.

security-group web-elb-sg:
    description: 'Allow port 80 and 443 from the world'
    ingress-rules: { 80: all-tcp, 443: all-tcp }
    egress-rules:  { 0-65535: all-tcp }

security-group web-server-sg:
    description: 'Allow port 80 from the web-elb-sg Security Group'
    ingress-rules:  { 80: allow-from-web-elb }
    egress-rules:   { 0-65535: all-tcp }

rule allow-from-web-elb:
    ip-protocol: 'tcp'
    groups:      [ web-elb-sg ]

When you have finished editing go to the Fugue CLI and run the following commands:

$ fugue load foo.lw
$ fugue update foo.lw

Fugue's load command compiles the modified composition and uploads it to S3. Fugue's update command then signals the Conductor to grab the updated version from S3 and apply the changes to the existing running composition.

Why is This Important?

You can spend hours each week manually maintaining and modifying network configurations and still, the warts will grow. It's just not a smart use of employee resources. Fugue provides an automated and deterministic way to deploy, maintain, and update infrastructure components. Tell the Conductor what you would like and it will make it happen and keep it there.

With Fugue, you can maintain the fidelity of your networks and at the same time reduce deployment times and ongoing maintenance burdens. Because the Ludwig DSL is intent-based, repeatable, enforceable, and auditable, you can operate your applications with the confidence of knowing that your network configurations are constantly maintained at the state they intended.

With immutable network infrastructure, meeting strict internal and external security and compliance requirements for your applications is just a few lines of code away.

Take Control of Your Cloud Today

You and your team can be productive with Fugue on AWS in less than an hour, without the need for professional services.