Immutable Infrastructure Realized: Fugue Computing

We at Luminal are launching our new vision for computing: Fugue. Fugue embodies a set of core computing patterns that rely upon:

  • Automating the creation and operations of cloud infrastructure through a no-touch runtime environment. This uses an active infrastructure OS under users’ control and within their environment.
  • Short-lived compute instances that are created and destroyed by this infrastructure OS, resulting in higher fidelity systems that optimize performance and cost.
  • Simplification of compute instances to reduce vulnerability.

You may recognize in these patterns the meme of “immutable infrastructure”—the idea that computing infrastructure elements not be changed through in situ repair or upgrade—but rather that they be purposefully thrown away and replaced in order to improve system fidelity and performance. Fugue is the first practical, scalable solution for defining, deploying, and managing elements of cloud systems that is consistent with precepts of immutable infrastructure.

As with all proposals about the future, this is a work in progress. We hope that many of you will agree with our vision and join us by using the Fugue beta.

Why Fugue?

Cloud computing is changing how we compose and operate systems. As we move applications to the cloud or build there natively, the fundamental ways we compose systems are rapidly expanding beyond the individual computer and taking on more distributed forms. The nature of cloud service offerings makes this natural and efficient.

We increasingly compose systems of elastic collections of services running on many compute instances. We now commonly employ application statelessness in order to exploit cloud system elasticity and to achieve the performance required of web scale systems. As we make these changes, we discover that system declaration, management, and security cannot be accomplished in ways that were suitable for individual systems. Existing methods are insufficient and immature.

A fugue is a musical form that repeats and evolves a theme over time. Similarly, Fugue Computing is about repeating and automating the creation and destruction of components of a system, incorporating changes only on replacement. Fugue allows us to define the granularity at which our cloud infrastructure will be mutable and to simply and automatically enforce these definitions. This patterned behavior shares remarkable features and benefits with cellular regeneration, through which biological systems preserve fidelity by renewing and replacing components.

As with this biological regeneration, Fugue continuously refreshes computing resources and thereby simply and naturally maintains system truth and trust in ways that are not otherwise possible. Fugue enables us to meet the challenges of cloud system declaration, management and security in a new and fundamental way. Fugue makes it simple to declare mutable components as immutable, and to then automatically and recursively, deploy, retire and replace them in a user-controlled cycle that realizes the benefits of immutable infrastructure.

Distributed Computing – Redux in the Cloud

When our current computing operating systems were developed, they were intended for a relatively small number of long-lived, multi-function, multi-user computers. When I started my career in programming, it cost $20K or more to run even a basic form of UNIX on a workstation. A barely adequate personal computer cost $3000 or more. Serious computing was performed on mainframes or minicomputers with a coterie of priests to attend to them. Each of these business class systems was constantly and carefully tended and maintained because it was expensive and had hundreds or thousands of users. Off-hours downtimes were acceptable and expected. Machine time and resources were expensive and scant. UNIX was developed so that more people could access these constrained resources interactively, rather than wait to have a job run on a schedule.

In that hardware constrained world, it made great sense to compose systems at the computer level. If you were connected to a network with more than a few hosts, you were likely in a large company. Running a program across multiple computers was a subject of research, not a way to do business. The operating systems we now use came from this world. In terms of composing overall systems, these OS are better suited to the inelastic, single CPU world than to the elastic, distributed CPU, IaaS world in which we live today.

In many contemporary systems, compute instances, such as those offered by Amazon Web Services (AWS) Elastic Cloud Compute (EC2), serve only a single function and are composed into systems by connecting them through interfaces. From a system-level view, the compute instance is the new OS process equivalent.

Elastic compute systems are now distributed across several, tens, or hundreds or more physical computers, but there is no central coordinating function, or infrastructure level operating system (OS) to act as a nexus for control and trust. We have exploded the complexity and mutability of the system, without simultaneously advancing the framework for controlling even relatively simple, non-distributed environments. Consequently, it is difficult to trust these systems. Fugue is about redressing this mismatch, and restoring truth and trust.

Computing Truth and Trust are Inseparable

Truth means always and centrally knowing the state of system infrastructure. Trust means having confidence that the system is functioning as intended, reliably and repeatedly.

Today, most approaches to building and operating distributed cloud computing systems don’t fully exploit the benefits of elastic infrastructure; they are passive and mimic single CPU-based batch processing. One reason for this is that in the cloud, developers can’t realistically run scripts to build infrastructure and expect that system truth will endure or that resulting application tiers can be trusted. Infrastructure immediately begins to drift from initial configuration. Network neighbors become noisy, and the accumulation of well intended administrative intervention introduces unintended consequences and failure modes. The bag-of-scripts + dashboard approach is ineffective and causes more problems than it solves. Simply stated, the software infrastructure to implement anything like immutable infrastructure in current cloud environments does not exist.

To make this possible, we need the equivalent of an operating system for distributed computing instances in cloud infrastructures: an infrastructure-OS. It needs to have capability to be invoked automatically, and to operate autonomically, so that we have much better capacity to know truth. But we also need trust, which can only happen when these distributed systems are maintained over time in much the same way that an operating system maintains the distributed resources of an individual computer. Trust is established and maintained through continuously ensuring that individual components have not been repurposed by mistake or ill intention.

Knowing truth about a distributed system is difficult, in large part because we mistakenly treat truth and trust as separable. The only way to achieve either is to achieve both. Computer operating systems exist to provide an integrated but simple solution that controls and reports, establishing and maintaining both truth and trust. Distributed computing systems in the cloud need something similar, and this is the motivation for Fugue.

Short-lived Components = Truth + Trust

Often in nature, the more purposeful and complex the system, the more frequently it is made of short-lived components so that the overall system has long-term fidelity. At the level of a sophisticated biological organism, this isn’t a choice – it’s the way the universe works. Humans are made of a complex stack of short-lived components. We aren’t made of all the same cells we were yesterday. DNA and its runtime system of chemistry are the source of trust.

Human biology works through a feedback system that uses apoptosis, the programmed death of cells, as a guarantor of long-term trustworthiness. The result is a distributed system that maintains fidelity over a period of decades, while most of its components have life spans measured in days or months. Some key components such as brain cells are longer-lived, but they are well protected from outside forces by layers of short-lived components such as skin and blood cells.

Likewise, when composing and maintaining modern computing systems, we believe that relying primarily on short-lived components best allows us to always and centrally know the state of system infrastructure, and to have confidence that overall system function is as intended. We must have immutable infrastructure in which the units of mutability are well defined and understood. Only then can we hope to optimize performance and cost.

We gain the ability to choose as a matter of design those parts of a system where greater component longevity enhances performance and minimizes cost while still maintaining trustworthiness. In-service components are all immutable, but they need not all be refreshed at the same interval. Longer-lived components can be shielded behind more ephemeral layers. The cloud infrastructure OS automatically controls life spans and resource allocation.

In computing systems, data often must be long-lived, but the majority of application functions (e.g. transaction, analysis, presentation) need not be. The last few years have seen the rise of data persistence technologies that allow for short-lived compute instances, and we expect this trend to continue. It has become a standard practice to engineer location and degree of state and persistence rather than to reflexively require it throughout a system. These approaches unlock the potential for Fugue Computing.

Trust has Many Enemies

A trusted system does only what its author intends. It remains difficult to build trusted systems in cloud infrastructures for four reasons: exogenous environmental changes, configuration drift, design and implementation flaws (bugs), and malign actors. All are sources of mutability that Fugue helps users to alleviate or eliminate:

Environment Changes

Cloud computing environments change frequently, in large part because they are shared. Your compute instances have neighbors, and those aren’t always well behaved. Your neighbors consume network resources in close proximity to yours. Your maintenance actions change resource relationships in unpredictable ways. Fugue’s infrastructure OS observes and reacts to problematical changes in system environment by deploying new immutable components without any human intervention -- just as an operating system kernel does so for an individual compute instance. As an additional benefit of these automated behaviors, Fugue makes it possible to manage in near-real time one of the most important new degrees of freedom introduced by cloud infrastructure service providers: variable resource cost.

Configuration Drift

No matter how well computing systems are designed and implemented, they suffer configuration drift over time because of ad-hoc changes and entropy effects. Nature’s approach to configuration drift is again best: replace components of the system aggressively, whether or not they are known to be corrupted. This is the only way to enforce immutability. A single, modern operating system has tremendous complexity and interdependence between processes. It is inevitable that once a system begins running, it starts to drift from the state in which it launched.

The longer a system lives, the more afraid of it I become. ... An old system inevitably grows warts. They start as one-time hacks during outages. ... ‘We’ll put it back into Chef later.’ ... The system becomes a house of cards. You fear any change and you fear replacing it since you don’t know everything about how it works. Chad Fowler

As Chad points out, you can’t really know whether they are still operating correctly or not because the components are complex and mutable, as well as exploitable.

Design and implementation flaws - bugs

Even the best designed and implemented software will contain unknown faults, and some of these bugs are intermittent or only appear after extended operations. System truth and trust must be maintained notwithstanding the inevitable bugs. As a recent bash vulnerability has shown, critical bugs can hide for decades. In Fugue Computing, command processors such as bash can be removed from deployed systems, since they are no longer needed for operation and maintenance. Removing command processors significantly reduces the likelihood of introducing intermittent bugs. Frequently destroying and replacing compute instances naturally mitigates bugs that would appear only over long periods of run time.

Malign Actors

Black hats thrive in persistent systems. Open ports and protocols and executables are sources of vulnerability that are waiting… waiting… waiting… to be exploited. The longer a compute instance is running, the longer the black hat has to discover and act. Fugue Computing dramatically reduces vulnerability surface by removing command processors and making compute instances ephemeral - without which true immutability is impossible.

There’s A Lot of Work To Do.
Let’s Do It Together.

We have a lot of work left to do, even though the fundamental components necessary to implement truly immutable infrastructure are already present in the Fugue beta release. We will publicly announce the availability of (and a special offer for!) the Fugue beta release on November 13th at the AWS re:Invent Startup Launch event. We cannot succeed without igniting the imaginations of the community of cloud developers and users. Please join our beta program and help to make immutable infrastructure a reality.

Best Regards,
The Luminal Team

Go Fast. See Everything.
Get Cloud Right.