MIT researchers are developing a new approach for training general-purpose robots

On November 4, 2024By Pranco

What just happened? Researchers at the Massachusetts Institute of Technology (MIT) have developed a new approach to training general-purpose robots, inspired by the success of large language models such as GPT-4. Called Heterogeneous Pretrained Transformers (HPT), this approach allows robots to learn and adapt to a wide range of tasks – something that has been difficult until now.

The research could lead to a future where robots are not just specialized tools, but flexible assistants that can quickly learn new skills and adapt to changing conditions, becoming true general-purpose robot assistants.

Traditionally, robot training has been a time-consuming and expensive process, requiring engineers to collect specific data for each robot and task in controlled environments. As a result, robots would have difficulty adapting to new situations or unexpected obstacles.

The MIT team new technology combines large amounts of heterogeneous data from different sources into one system that can teach robots a wide range of tasks.

At the heart of the HPT architecture is a transformer, a type of neural network that processes input from various sensors, including vision and proprioception data, and creates a shared “language” that the AI model can understand and learn from.

“In the field of robotics, people often argue that we don’t have enough training data. But in my opinion, another big problem is that the data comes from so many different domains, modalities and robot hardware,” says Lirui Wang, the lead author. of study and a graduate student in electrical engineering and computer science (EECS) at MIT. “Our work shows how you can train a robot if you have them all together.”

Wang’s co-authors include fellow EECS graduate student Jialiang Zhao, Meta-researcher Xinlei Chen and senior author Kaiming He, associate professor at EECS and member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). The research will be presented at the Conference on Neural Information Processing Systems.

One of the main advantages of the HPT approach is the ability to use a huge data set for pretraining. The researchers have compiled a dataset consisting of 52 datasets containing more than 200,000 robot trajectories in four categories, including human demonstration videos and simulations.

This pretraining allows the system to transfer knowledge effectively when learning new tasks, requiring only a small amount of task-specific data for refinement.

The HPT method outperformed traditional training-from-scratch approaches by more than 20 percent on both simulated and real-world tasks. The HPT system still showed better performance even when faced with tasks that differed significantly from the pre-training data.

“This paper provides a novel approach to training a single policy for multiple robot executions,” said David Held, associate professor at Carnegie Mellon University’s Robotics Institute, who was not involved in the study. “This enables training across different data sets, allowing robot learning methods to significantly scale the size of the data sets they can train on. It also ensures that the model can quickly adapt to new robot executions, which is important as new robot designs are constantly being produced .”

The MIT researchers want to improve the HPT system by investigating how data diversity can improve its performance. They also plan to expand the system’s capabilities to handle unlabeled data, similar to how large language models like GPT-4 work.

Wang and his colleagues have set an ambitious goal for the future of this technology. “Our dream is to have a universal robot brain that you can download and use for your robot without any training,” Wang explains. “Although we are still in the early stages, we will continue to push hard and hope that scaling up leads to a breakthrough in robotics policy, as was the case with large language models.”

The Amazon Greater Boston Tech Initiative and the Toyota Research Institute partially funded this research.