Building a Fast Learner

As more industries adopt machine learning, models are becoming increasingly complex and sophisticated—and as a result, the cost of training and deploying machine learning is fast becoming prohibitive for many organizations. Launched in 2018, AWS Inferentia—the first machine learning chip built by Amazon Web Services (AWS)—is transforming the landscape of machine learning, enabling customers to run models at a fraction of the price without compromising on speed or accuracy.

Since its launch in 2018, Autodesk’s AI-avatar service chatbot, AVA, has been busy.

Short for Autodesk Virtual Assistant, AVA fields more than 100,000 customer inquiries a month—from basic troubleshooting to detailed technical support on the more than 100 products the company offers—and has singlehandedly dropped Autodesk’s customer response time for key customer requests from hours to frequently just five minutes. In other words, AVA is pretty good at its job.

Until recently, however, AVA was running into a problem of its own. Powered by machine learning algorithms, AVA reacts and responds via expansive natural language models that supply it with answers to the near-infinite combination of inputs customers feed it. But as those customer inputs grew in volume and variability, Autodesk had to adopt more complex models to service AVA’s requests—and these new, more sophisticated models were threatening to outpace Autodesk’s budget to run and support them.

“We were faced with a direct challenge,” explains Alex O’Connor, a Senior Manager in Data Science and Machine Learning at Autodesk. “Higher data volumes required a shift from the models that we had traditionally been using to the newer, cutting-edge models, and we were looking at a large cost increase just to keep the models running.”

Democratizing machine learning

In this dilemma, Autodesk was far from alone.

Over the last ten years, machine learning has become so ubiquitous that its mechanisms—and accuracy—are easy to take for granted. It’s the brain behind chatbots; the secret to an inbox free of spam; the voice of Alexa.

As an industry leader in bringing new AI/machine learning technologies to its customers, AWS’s vision is to make machine learning accessible to every developer and data scientist. It offers the broadest and deepest portfolio of machine learning services and infrastructure that enable developers at all levels of expertise to adopt machine learning at scale. By running their machine learning workloads in AWS, customers get on-demand access to innovative cloud services as well as high-performance, low-cost, and easy-to-use infrastructure services for every stage of the machine learning lifecycle.

AI/machine learning is a fast-changing field. New applications for increasingly powerful technology are constantly on the rise in industries ranging from retail to health care. The problem is, the cost of operating such sophisticated models—especially in highly complex fields like natural language processing—has become increasingly prohibitive for many small and midsize businesses. "Smart” doesn’t come cheap.

Back in 2017—around the same time that AVA and tools like it were becoming more popular—a team at AWS was monitoring this trend closely. It had introduced GPU-based compute services in the cloud to deliver cloud scale and agility to customers to build their machine learning applications. But at the current rate at which model complexity was growing, the playing field for scalable machine learning solutions was fast becoming exclusive to massive organizations. The market was desperate for an effective, more affordable solution.

So AWS decided to build it.

The objective was a lofty one. After all, AWS was setting out to create a product that would democratize machine learning: to make it significantly less expensive, without compromising on ease of use or performance. Inference (how a machine learning model determines a specific output) was the natural place to start; training a model is often a constrained cost, whereas inference can often be a snowballing expense, rapidly comprising budgets and resources.

Luckily, AWS had already begun investing in custom silicon as a versatile solution to unlock more performance and deliver greater security. It also had learnings to be applied from the development of its groundbreaking Graviton processor; such as the early inclusion of software experts in the development process to ensure ease of use and compatibility with leading machine learning frameworks and tools.

Just over a year after the AWS team had begun development on their solution, Inferentia was born: a custom silicon chip that powers Amazon EC2 Inf1 instances and paired with, and optimized by, AWS’s Neuron software development kit. After its initial launch, AWS has continued to enhance Neuron, adding support for more models, operators, and open-source frameworks and tools. This has helped Inferentia deliver unmatched cost-performance value, higher throughput, lower latency, and massive improvements in ease of use.

What was once a lofty objective—powerful machine learning compute at a fraction of the price—had suddenly become a lot more tangible.

Taking the leap

After Autodesk migrated their models to Inferentia, they were impressed: not only was the system simple to integrate—it often takes only a line or two of code—but practically overnight, AVA’s cost-performance rate increased five-fold. “You can either think of that as being able to serve five times as many people for the same cost,” says O’Connor, “or you can think about it as, for the same fixed budget, we can launch five new models.” O’Connor also notes that the product seemed to have been designed with developers in mind. Binghui Ouyang, senior data scientist at Autodesk, has described the process of compiling a model using Neuron as “largely automatic.” With its streamlined process, Inferentia can create a traced model with just a few lines of code. “This is a great advantage for testing and engineering new models quickly,” Ouyang writes.

Between its lower latency and the lower cost for deployment, O'Connor and his colleagues also feel like they’ve been given a “budget of time.” Inferentia’s cost savings freed up a considerable amount of resources, but also established a confidence level that the company could afford to be innovative in the future.

Ava powered by AWSs Inferentia can now serve 5x as many customers with increased accuracy at almost half the cost of GPU...

AWS wants to help shape that future. Later this year, AWS plans to build on Inferentia with the launch of AWS Trainium. A custom training chip built from the ground-up and optimized for training of deep learning models, Trainium will deliver low cost and high performance infrastructure for deep learning training. It marks another huge stride toward making machine learning broadly accessible—and that’s where the possibilities become endless. Autodesk included.

Using Inferentia, Autodesk was able to obtain a 4.9 times higher throughput over their GPU-based instances for the Intent Model for AVA, as well as cost reductions of up to 45 percent on their various natural language processing applications.¹The Ava Intent models were built using PyTorch, a popular open source machine learning framework that accelerates moving models from prototyping to efficient production deployment using tools that are tightly integrated with AWS services.

“I’m much more comfortable now saying, ‘Yes, we can put that model out there,’ and ‘Yes, we can manage another model’,” says O’Connor. “Anything we want to build on from a deep learning perspective, we can do from [Inferentia].”

This story was produced by WIRED Brand Lab for Amazon Web Services.