UC Berkeley Researchers Use a Dreamer World Model to Train a Variety of Real-World Robots to Learn from Experience

On Jul 5, 2022

Robots need to learn from experience to solve complex in real-world environments. Deep reinforcement learning has been the most common approach to robot learning but requires much trial and error. This requirement limits its deployment in the physical world. This limitation makes robot training heavily reliant on simulators. The downside of simulators is that they fail to capture the natural world’s aspects and inaccuracies affect the training process. Recently, the Dreamer algorithm outperformed pure reinforcement learning in video games in terms of learning from brief interactions by planning inside a learned world model. Planning in the imagination is made possible by learning a world model that can forecast the results of various actions, which minimizes the amount of trial and error required in the natural world.

However, whether Dreamer can facilitate faster learning on physical robots is unknown. Researchers applied Dreamer to real-life robots instead of simulators. The algorithm trained a quadruped robot to stand up and walk from scratch within an hour. It also adapted to external pushing within 10 minutes. In addition, Dreamer allowed robotic arms to pick and place multiple objects from camera images in a less rewarding environment. Dreamer on a wheeled robot learned to navigate to a goal position purely from camera images resolving ambiguity about the robot’s orientation. Researchers found Dreamer capable of online learning in the real world, establishing a solid baseline.

One of the fundamental issues in robotics research is how to teach robots to tackle challenging problems in the real world. Deep reinforcement learning (RL), a well-liked method of teaching robots, enables them to learn from their mistakes and gradually improve their behavior. However, present algorithms are unsuitable for many activities in the real world because they need too much contact with the environment to acquire effective behaviors. Modern world models have recently demonstrated considerable potential for data-efficient learning in virtual worlds and video games. Robots can now predict the results of probable actions thanks to learning world models from prior experience. Prediction reduces the amount of trial and error in the actual world that is required to develop effective behaviors.

Although it can be challenging to develop correct world models, they have attractive qualities for robot learning. World models, which anticipate future events, enable planning and behavior learning with little real-world involvement. Additionally, world models condense generic environmental dynamics information that, once understood, might be used for various subsequent activities. World models develop representations that combine many sensor modalities and integrate them into latent states eliminating the necessity for manual state estimates. Last but not least, world models generalize effectively from accessible offline data, which could speed up learning in the actual world even more.

Despite the benefits of world models, learning the correct world models for the real world remains a significant open problem. The study uses recent developments in the Dreamer world model to train a range of robots in the simplest and most fundamental issue scenario: online reinforcement learning in the actual world, devoid of simulators or demonstrations. The figure below illustrates how Dreamer builds a world model from a replay buffer of prior experience, learns behaviors through rollouts envisioned in the world model’s latent space, and continually engages with the environment to discover and refine its behaviors.

Source: https://arxiv.org/pdf/2206.14176.pdf

Researchers aimed to test the boundaries of robot learning in the real world while providing a solid foundation for future research that establishes the advantages of global models for robot learning. The following is a summary of this paper’s significant contributions:

To demonstrate good learning in the virtual environment without using new algorithms. The exercises include a variety of difficulties, including several action areas, sensory modalities, and reward systems.
Training a quadruped to roll off its back, stand up, and walk in under one hour. After that, the Robot becomes accustomed to being pushed within 10 minutes.
Pick and Place Visually By localizing items from pixels and combining pictures with proprioceptive inputs to train robotic arms to choose and arrange objects from scarce rewards.
The open-source software architecture of all the studies accommodates various action areas and sensory modalities and provides a versatile framework for pursuing future research into global models for robot learning in the real world.

By discovering current world models that sample effective robot learning for a variety of tasks, from scratch in the actual world and without simulators, this algorithm unquestionably contributes to the development of the future of physical robot learning.

This Article is written as a summary article by Marktechpost Staff based on the research paper 'DayDreamer: World Models for
Physical Robot Learning'. All Credit For This Research Goes To UC Berkeley Researchers. Checkout the paper and project.

Please Don't Forget To Join Our ML Subreddit

Credit: Source link