Reinventing virtualization for better cloud computing

Nearly a decade ago, the team at Amazon Web Services (AWS) realized there was a limitation with its tech stack. After spending years optimizing traditional virtualization to the limit, AWS knew they had to make a dramatic change to the architecture if they were going to continue to make substantial increases in performance and security for their customers. This realization resulted in AWS rethinking everything about virtualization and computing in the cloud and became the spark for rebuilding their core virtualization platform—from the ground up.

Image may contain Lighting Human and Person

Ten years ago, the cloud computing industry was approaching a crossroads. Everyone—from students to developers to multinational corporations—seemed to be asking for more. More performance. Increased storage. Faster networking. Tighter security. Cloud computing, though already indispensable to modern computing, was not innovating fast enough to keep up with growing user demands, bogged down by inefficiencies, complexity, and management overhead stemming from traditional virtualization systems.

In the beginning, Amazon EC2, a web service that provides secure, resizable compute capacity in the cloud, was built on a purely software-based architecture, powered by a custom version of the open-source Xen hypervisor. This technology that isolates and dedicates resources to virtual machines had worked brilliantly for years, but its limitations were becoming apparent. Customers consistently told AWS that they wanted more performance and lower costs. One problem, however, was that the system’s hypervisor gobbled up too many computing resources, including CPU and memory. Performance variation in EC2’s software stack, caused in part by increasing customer demand, seemed like an omen of greater inconsistencies to come. And even with already robust security, there were opportunities for AWS to make security and data confidentiality more intrinsic to EC2.

Moving to a new, hardware-based approach to virtualization was a logical, although daunting, decision.

AWS had largely been using the same technology stack to power EC2 since it launched in 2006. But as customer demands increased, AWS found it increasingly difficult to further optimize the traditional virtualization platform it had been using to deliver significant improvements for its customers. The increasing need for radical innovation meant AWS faced a dilemma: either try and further iterate on its core product by doubling down on effort, accepting marginal improvements—or completely change course. In short, the underlying virtualization technology that cloud computing was founded upon had become a great limiter to significant progress.

For most companies, a radical pivot would have been unthinkable. EC2 was a massive, robust environment and a multi-billion dollar business. It would take years to design and implement a new architecture, just like it had taken years to create this one. The idea of rebuilding something that wasn’t exactly broken seemed outlandish—but because it was the key to unlock future innovation, AWS would do it.

It was time to reimagine virtualization. A novel approach that offloaded virtualization functions to custom built hardware, AWS decided, was EC2’s future. This is how they did it.

One step at a time

Although the concept of offloaded virtualization functions (moving functions from software to hardware) had orbited the industry for years, AWS’s approach to it was unprecedented.

AWS had to reinvent the underlying EC2 technology from the ground up and transition more than a million customers to it, near seamlessly without impacting customer applications or data. The challenges were numerous.

First, the AWS team boldly accepted the inevitability of an iterative process—there was an understanding across the organization that it would take years, multiple teams, inevitable missteps, and consistent funding to get the change right. The system needed to be streamlined, from both a performance and a security perspective, which meant major EC2 features would eventually need to be offloaded to dedicated hardware.

That wasn’t going to happen overnight. AWS committed to a long-term vision; to avoid destabilizing the whole operation and offering, the offloading process would have to be an incremental one. Werner Vogels, Amazon’s CTO, has written about AWS’s decision to move to hardware-based virtualization as a “one-way door”: This was a choice, he explained, that would be “almost impossible to reverse.” In theory, walking such a definitive path requires a deliberate and methodical approach.

In practice, this translated to a rolling launch of offloaded virtualization capabilities over the next several years.

First, the C3 instance type came in 2013. Branded as “enhanced networking,” this instance featured an offload card, which transferred EC2’s network processes onto hardware and marked the first installment of AWS's newly minted Nitro System. A year later, the C4 instance type followed and introduced another offload card, engineered to offload the cloud system’s Elastic Block Store (or EBS) to hardware. By 2017, AWS had offloaded the remaining components, including the control plane and the remaining I/O as well as built a Nitro Security Chip to provide a hardware root of trust and enhanced security. Finally, C5 instances heralded in EC2’s new custom-built hypervisor, the Nitro Hypervisor—and with it, the modern iteration of the AWS Nitro System.

With each step, the performance of the offloaded virtualization functions improved because dedicated hardware could now accelerate the input/output for those functions. At the same time, the elimination of a 30 percent management overhead caused by the legacy hypervisor allowed more performance to be dedicated to the customer’s EC2 instances.

Most importantly, AWS was able to rebuild their underlying virtualization platform while keeping their business growing and customers happy. Over 90 percent of the AWS roadmap comes from understanding the customer’s needs—and based on extensive conversations and planning sessions with customers, AWS was able to precisely map Nitro’s function and core purpose to customer needs of more performance, lower costs, increased security and confidentiality, and more innovation.

AWS reimagined and rebuilt virtualization by developing the Nitro System.

The Nitro System has been a remarkable success, delivering unparalleled customer price performance and security—all through a modular system that allows AWS to launch new instances far faster and more efficiently than ever before. In just the last two years, AWS has

launched more EC2 instances than in the previous 13 years combined, and now offers more than 400 instance types enabling customers to tailor infrastructure to their application needs.

Even then, Nitro’s cardinal success may be even bigger: Its benefits have paved the way for additional innovations such as bare metal instances, EC2 Mac instances, and hybrid and edge compute offerings, like AWS Outposts and AWS Wavelength, which bring AWS services closer to customers in data centers, metro areas, and the 5G edge. All this allows customers to bring workloads to the cloud that they couldn’t—or wouldn’t—have before.

Better customer experiences through silicon innovation

For AWS, the future is brimming with possibilities. The AWS Nitro System is the building block for further innovation across the platform and will continually enable additional EC2 features and new AWS products and services. Development of the Nitro System was part of a broader strategy to innovate at the silicon level to deliver a better customer experience for their customers through increased performance, lower cost, improved security and confidentiality, and more innovation. And in the past few years, AWS has also innovated in their silicon with the launch of EC2 instances powered by general purpose AWS-designed Graviton processors as well as chips targeted at machine learning with AWS Inferentia.

And just like with the Nitro System, AWS will take the time required to build the products and services that customers need and ask for. An incremental, collaborative approach—one that puts the customer at the forefront—will dictate the innovation that follows. AWS aims to deliver an experience for customers that’s well beyond what they could achieve on their own. And it’s innovations like AWS Nitro System that enable that.

This story was produced by WIRED Brand Lab for Amazon Web Services.