CMU Researchers Developed a Simple Distance Learning AI Method to Transfer Visual Priors to Robotics Tasks: Improving Policy Learning by 20% Over Baselines

On Aug 18, 2023

A significant barrier to progress in robot learning is the dearth of sufficient, large-scale data sets. Data sets in robotics have issues with being (a) hard to scale, (b) collected in sterile, non-realistic surroundings (such as a robotics lab), and (c) too homogeneous (such as toy items with preset backgrounds and lighting). Vision data sets, on the other hand, include a wide variety of tasks, objects, and environments. Therefore, modern methods have investigated the feasibility of bringing priors developed for use with massive vision datasets into robotics applications.

Pre-trained representations encoding picture observations as state vectors are used in previous work that makes use of vision data sets. This graphical representation is then simply sent into a controller trained using data collected from robots. Since the latent space of pre-trained networks already incorporates semantic, task-level information, the team suggest that they can do more than just represent states.

New work by a research team from Carnegie Mellon University CMU shows that neural picture representations can be more than merely state representations since they can be used to infer robot movements with the use of a simple metric created within the embedding space. The researchers use this understanding to learn a distance function and a dynamics function with very little cheap human data. These modules specify a robotic planner that has been tested on four typical manipulation jobs.

This is accomplished by splitting a pre-trained representation into two distinct modules: (a) a one-step dynamics module, which predicts the robot’s next state based on its current state/action, and (b) a “functional distance module,” which determines how close the robot is to attaining its goal in the current state. Using a contrastive learning objective, the distance function is learned with only a small amount of data from human demonstrations.

Despite its apparent ease of use, the proposed system has been shown to outperform both traditional imitation learning and offline RL approaches to robot learning. When compared to a standard BC baseline, this technique performs significantly better when dealing with multi-modal action distributions. The results of the ablation investigation show that better representations lead to better control performance and that dynamical grounding is necessary for the system to be effective in the real world.

Since the pre-trained representation itself does the hard lifting (due to its structure), and completely avoids the difficulty of multi-modal, sequential action prediction, the findings show that this method outperforms policy learning (through Behavior Cloning). Additionally, the earned distance function is stable and straightforward to train, making it highly scalable and generalizable.

The team hopes that their work will spark new research in the fields of robotics and representation learning. Following this, future research should refine visual representations for robotics even further by better portraying the granular interactions between the gripper/hand and the things being handled. This has the potential to enhance performance on activities like knob turning, where the pre-trained R3M encoder has trouble detecting subtle changes in grip position about the knob. They hope that studies would use their approach also to learn completely in the absence of action labels. Finally, despite the domain gap, it would be wonderful if the information gathered with their inexpensive stick could be employed with a stronger, more dependable (commercial) gripper.

Check out the Paper, GitHub, and Project. All Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 28k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

Dhanshree Shenwai is a Computer Science Engineer and has a good experience in FinTech companies covering Financial, Cards & Payments and Banking domain with keen interest in applications of AI. She is enthusiastic about exploring new technologies and advancements in today’s evolving world making everyone’s life easy.

🔥 Use SQL to predict the future (Sponsored)

Credit: Source link