Atlas, the humanoid robot famous for its parkour and dance routines, has recently begun demonstrating something altogether more subtle but also a lot more significant: It has learned to both walk and grab things using a single artificial intelligence model.
What is more, the robot’s single learning model is showing some tantalizingly “emergent” skills, like the ability to instinctively recover when it drops an item without having been trained to do so.
Boston Dynamics, the company that makes Atlas, together with the Toyota Research Institute (TRI), developed a generalist model that learns to control both arms and legs from a range of example actions. This is different from the norm: robots equipped with the ability to learn would usually rely on one model to walk and jump and another to grasp items.
“The feet are just like additional hands, in some sense, to the model,” says Russ Tedrake, a roboticist at the Toyota Research Institute and the Massachusetts Institute of Technology, who led the current work. “And it works, which is just awesome.” The co-lead on the research was Scott Kuindersma, VP of robotics research at Boston Dynamics.
The single model used to control Atlas is fed images from the robot’s visual sensors, proprioception data from bodily sensors (which give it a continuous sense of its position and movement), and language prompts related to different actions. The model is shown examples of Atlas performing a range of tasks using a mix of teleoperation, simulation, and demonstration videos. The resulting large behavior model (LBM) controls the humanoid robot in a more natural-seeming way. When picking items out of a bin, for example, the robot will reposition its legs much like a person to rebalance when reaching low down. The LBM also exhibits some basic emergent behavior. When the robot drops an item, for instance, it demonstrates a new “recovery” skill by bending down to pick it up.
This is a lot more exciting than it might seem. Just as large language models (LLMs) fed by huge amounts of text data sometimes exhibit unexpected abilities, like the ability to code, roboticists hope that a similar strategy will produce robots that exhibit a lot of surprising new skills when trying to get things done.
Tedrake says that Atlas and other robots are starting to show signs of more generalized learning. His lab is also experimenting with different kinds of robot arms that are trained to perform various tasks, including slicing vegetables and sweeping up spilled coffee beans.
While there is a lot of work to do, Tedrake says all of the evidence so far suggests that the approaches used to LLMs also work for robots. “I think it's changing everything,” he says.
Gauging progress in robotics has become more challenging of late, of course, with videoclips showing commercial humanoids performing complex chores, like loading refrigerators or taking out the trash with seeming ease. YouTube clips can be deceptive, though, and humanoid robots tend to be either teleoperated, carefully programmed in advance, or trained to do a single task in very controlled conditions.
The new Atlas work is a big sign that robots are starting to experience the kind of equivalent advances in robotics that eventually led to the general language models that gave us ChatGPT in the field of generative AI. Eventually, such progress could give us robots that are able to operate in a wide range of messy environments with ease and are able to rapidly learn new skills—from welding pipes to making espressos—without extensive retraining.
“It's definitely a step forward,” says Ken Goldberg, a roboticist at UC Berkeley who receives some funding from TRI but was not involved with the Atlas work. “The coordination of legs and arms is a big deal.”
Goldberg says, however, that the idea of emergent robot behavior should be treated carefully. Just as the surprising abilities of large language models can sometimes be traced to examples included in their training data, he says that robots may demonstrate skills that seem more novel than they really are. He adds that it is helpful to know details about how often a robot succeeds and in what ways it fails during experiments. TRI has previously been transparent with the work it’s done on LBMs and may well release more data on the new model.
Whether simple scaling up the data used to train robot models will unlock ever-more emergent behavior remains an open question. At a debate held in May at the International Conference on Robotics and Automation in Atlanta, Goldberg and others cautioned that engineering methods will also play an important role going forward.
Tedrake, for one, is convinced that robotics is nearing an inflection point—one that will enable more real-world use of humanoids and other robots. “I think we need to put these robots out of the world and start doing real work,” he says.
What do you think of Atlas’ new skills? And do you think that we are headed for a ChatGPT-style breakthrough in robotics? Let me know your thoughts on ailab@wired.com.
This is an edition of Will Knight’s AI Lab newsletter. Read previous newsletters here.