As we migrate applications to the cloud or build there natively, cloud computing itself is changing how we compose and operate our systems. We increasingly compose systems of elastic collections of services running on many compute instances. We now commonly employ application statelessness in order to exploit cloud system elasticity and to achieve the performance required of web scale systems. As we make these changes, we discover that systems management, operations, policy enforcement, and security in the cloud cannot be accomplished easily with tools and methods adapted from traditional data center environments.
Our reality is that the elastic compute systems of any given enterprise are now distributed across tens, hundreds, thousands or more nodes running an ever-growing array of cloud services, but there is no central coordinating function to act as a nexus for control and trust. We have exploded the complexity and mutability of systems in the cloud, without simultaneously advancing the framework for controlling even relatively simple, non-distributed environments. Consequently, it’s become difficult to trust our systems.
In the midst of this unwieldy reality is an even more compelling reality—that the cloud is not, in fact, merely a collection of infrastructure. It’s the world’s first global computer. And, just as we abstracted the hardware of individual computers decades ago, we can abstract the distributed hardware of the cloud and radically simplify operations complexity. We can do this to great advantage, so long as we maintain the ability to dive into the guts of the lower-level system directly when needed.
Computing Truth and Trust Live in an Infrastructure-level Operating System
Computing “truth” means always and centrally knowing the state of systems infrastructure. “Trust” means having confidence that systems are functioning as intended, reliably and repeatedly.
Today, most approaches to building and operating distributed cloud computing systems don’t fully exploit the benefits of elastic infrastructure; they are passive and mimic single CPU-based batch processing. One reason for this is that, in the cloud, developers can’t realistically run scripts to build infrastructure and expect that system truth will endure or that resulting application tiers can be trusted. Infrastructure immediately begins to drift from initial configuration. The accumulation of well intended administrative intervention introduces unintended consequences and failure modes. The bag-of-scripts + dashboard approach is ineffective and causes more problems than it solves.
We need the equivalent of an operating system for distributed computing instances and components in cloud infrastructures. It needs to have the capability to be invoked automatically and to operate autonomically, so that we have much better capacity to know truth. But we also need trust, which can only happen when these distributed systems are maintained over time in much the same way that an operating system maintains the distributed resources of an individual computer. Trust is established and maintained through continuously ensuring that individual components have not been repurposed by mistake or ill intention.
Knowing truth about a distributed system is difficult, in large part because we mistakenly treat truth and trust as separable. The only way to achieve either is to achieve both. Computer operating systems exist to provide an integrated but simple solution that controls and reports, establishing and consistently maintaining known state. Distributed computing systems in the cloud need something similar, and this is the motivation for Fugue.
Cloud Services Are Hardware Under APIs
In the mental model of the cloud as a distributed, general purpose computer, each cloud service is akin to a hardware interface in traditional computing. Just as those were abstracted and managed by an operating system and programming language, so cloud services can be abstracted and managed.
For example, networking in cloud is usually handled by a collection of software defined network (SDN) services, such as virtual networks, inbound and outbound port and protocol rules, and load balancers. This is a familiar set of virtualized services, but it’s ripe for being abstracted into much simpler use patterns as we are no longer constrained by the limitations of hardware. When composing an application of services, we should just be able to say one service “talks-to” another service in our programming language and have the cloud operating system create and enforce the appropriate connection, rather than having to configure several appliances to line up correctly to allow the connection. On the other hand, it’s important to have the ability to get beneath the abstraction layer when you want more control over the details.
Just as we used to spend much of our time configuring hardware and writing software directly against it, we now do so with cloud’s hardware equivalent. Instead, we should be writing simple, enforceable programs that leave the low-level details aside unless they are important in a particular use case.
Browsing through our website, particularly the Product page, you’ll notice that the three main components of Fugue are the Ludwig language, the CLI, and the Conductor. Your Ludwig program, in the form of a Fugue composition, is run by the Conductor, which is roughly similar to an operating system kernel that runs inside your Amazon Web Services (AWS) account. The Conductor handles all the AWS API interactions. It provisions, instantiates, maintains, and destroys the resources needed for your program.
Fugue as a Kernel-based OS
So, you can think of Fugue’s Conductor as being much like a single machine’s operating system kernel which provides resources and manages those resources for an application. But, the Conductor is doing this at the cloud infrastructure level, creating Fugue processes (analogous to Unix processes), managing them, and destroying them. There’s a clear line between user space, in which Fugue processes run, and kernel space, where the control over those processes is held.
Just as with a traditional kernel operating system, kernel space is considered highly dangerous and hands-off for most direct interactions. Thus, you can limit access privileges for users of the system to a minimal set needed to read information from the cloud account and to send messages to the Conductor. This is a best practice when using Fugue, so that the system is safe and remains in a known-good state. Depending on the configuration of the Conductor, Fugue is designed to correct manual modifications of the system as it notices them. So, if you find yourself, say, in the AWS console, modifying an infrastructure component that is running in Fugue, but it keeps changing back to the declaration of the process you made in a Fugue composition, you’re seeing enforcement—a key Fugue pattern—in action.
Fugue as a Language-based OS
Fugue tackles the complexity of cloud services proliferation and constant states of change in cloud by reducing the vast majority of cloud concepts to Ludwig language types that are handled by planners.
Planners handle the semantics of interacting with the cloud service provider, and the Ludwig library and compiler allow high order functions to abstract away the complexity. In any given Fugue composition, many Ludwig types will be used and each of these types maps to a particular planner for interpretation and operation. This allows us to run a composition through the planner pipeline until every symbol is resolved into an API call or datum. Other than shell commands and integrations, all aspects of Fugue are language-based. This means that every aspect of the system is programmable by you and that we can reduce a truly complex environment to simple declarations. Over time, there will be many more planners available, along with Ludwig libraries to use them.
A true language-based operating system is made of the language it presents to the user and so provides complete access to the operating system itself through the language. This is not true of Fugue in that the planners are written in other languages. We’ve tried to draw the line between that which is expressed in Ludwig and that which is hard-coded into the planners in the right place for maximum ease of use and also user accessibility. Planners have runtime responsibilities, but they also have language interpretation responsibilities.
Any new feature of Fugue, integrations with other products, and new cloud services are first represented as Ludwig language constructs that are user-friendly and complete. These are generally new types in Ludwig, and it’s through them that Fugue determines which planners are needed at runtime. This focus on language clarifies a very murky and potentially complex problem space and allows us to impose some degree of safety and control over the cloud.
There’s More Work To Do. Let’s Do It Together.
Cloud computing can be efficient without an operating system over it, but it’s very hard to achieve and generally must be reinvented by each customer. Because cloud is so complex and growing in complexity all the time, it’s critical to have a single interface by which systems can be defined, updated, and operated over time, by which the your cloud lifecycle can be programmed and automated, by which your cloud ops can be transparent.
Fugue is a higher fidelity, easy-to-use, and powerful operating system for the cloud that delivers on these promises. This is next generation infrastructure automation that integrates well with your existing workflows.
We’ve explained a bit of our thinking behind Fugue and hope that it’s made you curious. Register for our Webinar and schedule a Demo to learn how Fugue can help your DevOps productivity and help your enterprise reduce complexity and cost.
Parts of this post have been excerpted from The Fugue Book, which will be available in the coming months through the Pragmatic Bookshelf. A big thanks to Pragmatic. Stay tuned!