Search engines may make people’s lives easier, but they can present challenges for service providers.
Not only do customers expect instant results, but they also want the search engine to finish their thought, predicting what words may come next. These large search queries are complex, requiring extra computing power to tackle a demanding and wide-ranging number of queries—which in turn requires a heavy amount of input and output (I/O) operations to perform quickly.
To access I/O data, these workloads rely on third-party Solid-State Drives (SSD). SSDs have been around for three decades, but in recent years, some workloads have been innovating faster than the technology could handle.
“Systems would break down because our customers would be running these heavy loads, and the [commercial] SSD just couldn’t keep up,” says Srini Srinivasan, the CTO of Aerospike, a database management company.
To better serve these customers and provide them with the latency, reliability, and security their applications required, AWS realized it needed to build its own SSDs. After delivering several generations of EC2 instances, including building custom components such as its own hypervisor, processor, networking cards, and ML chips, AWS saw SSDs as its next opportunity for improvement—one that would require forging a new path in the industry.
Responding to customer feedback, AWS engineers began creating a solution that didn’t yet exist. The resulting AWS Nitro SSD would enable I/O-intensive workloads—including relational databases, NoSQL databases, data warehouses, search engines, and analytics engines—to run faster and with more predictable performance.
“We’re constantly looking for opportunities to help our customers,” says Amit Shah, Principal Product Manager for AWS. “And one of the big areas was latency and the consistency of latency.”
The Road to Reducing Latency
AWS has a rich history of innovating on behalf of customers, delivering better performance and cost benefits.
One example of this customer-driven innovation is the development of the AWS Nitro System, which has enabled AWS to increase the pace of innovation, releasing instances that are optimized and better suited to handle increasingly complex requirements of customer workloads. When it came to the challenge of creating a better SSD, AWS kept its customers front and center, knowing that their applications required faster, more reliable, and more secure storage for their I/O-intensive applications.
An SSD is built from fast, dense flash memory called NAND. Each of the cells within that memory can only be written, erased, and then rewritten a limited number of times. A critical component of an SSD is the firmware that is responsible for managing and writing to the device while evening out the number of write cycles over time. This is designed to help extend the life of the SSD.
But a typical SSD can be unpredictable, slowing down when it attempts to perform numerous write functions that arrive at once, creating latency spikes. AWS engineers used their 16 years of experience understanding cloud workloads, in combination with their cloud database expertise, to build a sophisticated, power-fail-safe, custom flash translation layer (FTL) that is part of the firmware. AWS then tightly integrated that FTL with the existing Nitro System in order to deliver even better performance results while focusing on maximizing the actual application transactions-per-second increases under a sustained load.
Splunk, a leading data platform provider whose product is designed to investigate, monitor, analyze, and act on data at scale, is particularly sensitive to delays in accessing storage. As a customer who needs optimal access to SSDs, Splunk decided to adopt AWS Nitro SSD-based EC2 Im4gn/Im4gen instances.
“When evaluating the new Im4gn/Is4gen instances powered by AWS [Nitro SSDs], we observed an up to 50 percent decrease in search runtime compared to I3/I3en instances, which we currently use,” said Brad Murphy, Splunk Vice President of Cloud Operations and Infrastructure.
In creating the Nitro SSD, AWS decided to focus on reducing I/O latency as well as latency variability (also known as tail latency). The result is an SSD that can reduce latency by up to 60 percent and latency variability by up to 75 percent—providing customers increased reliability and improving their overall performance.
What’s more, Nitro SSD offers unmatched price performance. Instances that feature Nitro SSDs offer up to 30 percent better price performance compared to the EC2 instances with commercial SSDs.
“We want to provide the lowest cost and the best price performance for these workloads for our customers,” says Srinivasan. “That’s exactly what Nitro SSDs enable.”
Innovating to Increase SSD Reliability
Customers expect AWS to deliver reliable storage with consistent performance. SSDs, while reliable, still need to be updated over time.
So AWS designed the FTL to be easily upgraded, enabling AWS to diagnose and resolve issues without impacting customers. Moreover, because AWS builds their own systems, operational telemetry, diagnostics, and live updates could be integrated into the design. This helps AWS deliver increased reliability and consistent performance.
“SSDs all provide generally the same API, and they all do a good job with an average case, but our experience over the years is that each one has unpredictable and idiosyncratic behaviors,” says Peter DeSantis, SVP for AWS Utility Computing and Apps. “For example, garbage collection can kick in at an unexpected time and cause I/O requests to stall. And these unexpected behaviors make it really difficult when you’re trying to provide consistent performance.”
For Aerospike, adopting AWS Nitro SSDs resulted in significant improvement for their business.
“Previously, we could put one or two terabytes of storage on a storage node,” says Srinivasan. “Now we get 30 terabytes per node. For read/write workloads, we get about a 30 percent increase in performance. And for read-only, we get about a 70 percent increase in performance. Amazon was a pioneer in putting SSDs in their instances before other cloud vendors—and they’re still ahead.”
Customized for Your Needs and Better Efficiency
Improved reliability and latency enable AWS Nitro SSD customers to more efficiently use their resources, using less energy to perform the same tasks. In addition, Nitro SSDs are smaller in size than commercial SSDs, taking up less space and requiring less energy to both operate and cool.
AWS plans to continue innovating Nitro SSDs, and future iterations will further improve I/O performance, reduce latency, and add more enterprise-class storage features. In other words, AWS technology will allow even larger workloads to operate faster and more efficiently, helping latency become a cloud computing problem of the past.
This story was produced by WIRED Brand Lab for Amazon Web Services.

