Table of Contents
Fetching ...

REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation

Zheyuan Hu, Aaron Rovinsky, Jianlan Luo, Vikash Kumar, Abhishek Gupta, Sergey Levine

TL;DR

REBOOT tackles the sample-efficiency and reset/reward bottlenecks of real-world dexterous manipulation by reusing data from prior tasks and objects to bootstrap new skills. It integrates buffer initialization with a sample-efficient off-policy RL method, a vision-based reward learner, and imitation-based reset policies to enable autonomous real-world training on a four-fingered hand. The approach yields about a 2x improvement in learning speed across multiple objects and tasks, and ablation studies confirm the benefit of cross-task data and buffer sizing. The work demonstrates practical real-world dexterous manipulation without simulation or external instrumentation, advancing toward open-world autonomous learning.

Abstract

Dexterous manipulation tasks involving contact-rich interactions pose a significant challenge for both model-based control systems and imitation learning algorithms. The complexity arises from the need for multi-fingered robotic hands to dynamically establish and break contacts, balance non-prehensile forces, and control large degrees of freedom. Reinforcement learning (RL) offers a promising approach due to its general applicability and capacity to autonomously acquire optimal manipulation strategies. However, its real-world application is often hindered by the necessity to generate a large number of samples, reset the environment, and obtain reward signals. In this work, we introduce an efficient system for learning dexterous manipulation skills with RL to alleviate these challenges. The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping. This combination allows us to utilize data from different tasks or objects as a starting point for training new tasks, significantly improving learning efficiency. Additionally, our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy as well as learned reward functions, eliminating the need for manual resets and reward engineering. We demonstrate the benefits of reusing past data as replay buffer initialization for new tasks, for instance, the fast acquisition of intricate manipulation skills in the real world on a four-fingered robotic hand. (Videos: https://sites.google.com/view/reboot-dexterous)

REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation

TL;DR

REBOOT tackles the sample-efficiency and reset/reward bottlenecks of real-world dexterous manipulation by reusing data from prior tasks and objects to bootstrap new skills. It integrates buffer initialization with a sample-efficient off-policy RL method, a vision-based reward learner, and imitation-based reset policies to enable autonomous real-world training on a four-fingered hand. The approach yields about a 2x improvement in learning speed across multiple objects and tasks, and ablation studies confirm the benefit of cross-task data and buffer sizing. The work demonstrates practical real-world dexterous manipulation without simulation or external instrumentation, advancing toward open-world autonomous learning.

Abstract

Dexterous manipulation tasks involving contact-rich interactions pose a significant challenge for both model-based control systems and imitation learning algorithms. The complexity arises from the need for multi-fingered robotic hands to dynamically establish and break contacts, balance non-prehensile forces, and control large degrees of freedom. Reinforcement learning (RL) offers a promising approach due to its general applicability and capacity to autonomously acquire optimal manipulation strategies. However, its real-world application is often hindered by the necessity to generate a large number of samples, reset the environment, and obtain reward signals. In this work, we introduce an efficient system for learning dexterous manipulation skills with RL to alleviate these challenges. The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping. This combination allows us to utilize data from different tasks or objects as a starting point for training new tasks, significantly improving learning efficiency. Additionally, our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy as well as learned reward functions, eliminating the need for manual resets and reward engineering. We demonstrate the benefits of reusing past data as replay buffer initialization for new tasks, for instance, the fast acquisition of intricate manipulation skills in the real world on a four-fingered robotic hand. (Videos: https://sites.google.com/view/reboot-dexterous)
Paper Structure (23 sections, 2 equations, 13 figures, 1 table, 1 algorithm)

This paper contains 23 sections, 2 equations, 13 figures, 1 table, 1 algorithm.

Figures (13)

  • Figure 1: REBOOT achieves 2X sample efficiency boost on learning a variety of contact-rich real-world dexterous manipulation skills on three different objects autonomously by bootstrapping on prior data across different objects and tasks with sample-efficient RL and imitation learning-based reset policies.
  • Figure 2: REBOOT System Overview: Our method learns various dexterous manipulation skills in the real world using raw image observations. This is enabled by using sample-efficient RL and bootstrapping with data from other tasks and even other objects, with autonomous resets.
  • Figure 3: Depiction of our hardware platform and tasks. (a) custom-built 16 DoF robotic hand (c) teleoperation using the 3-D mouse, to interact with the following objects in-hand (b) blue football, (d) 3-pronged valve, (e) T-shaped pipe.
  • Figure 4: Successful rollouts of in-hand object manipulation policies for the three objects: purple 3-pronged object (Pose B), black T-shaped pipe, and blue football. The boxes on the right (outlined in green) are representative user-provided success state examples for each task. Note that the autonomous pickup policy picks up the object in a variety of different poses across episodes, requiring the in-hand manipulation skill to reorient it into the target pose from many starting configurations.
  • Figure 5: Learning curve showing the performance as a function of training time of reorienting the 3-prong object into different poses. Even though both our method and training from scratch eventually reach a success rate of 80%, our method gets there about two times faster.
  • ...and 8 more figures