Table of Contents
Fetching ...

Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost

Henry Zhu, Abhishek Gupta, Aravind Rajeswaran, Sergey Levine, Vikash Kumar

TL;DR

This work shows that model-free deep reinforcement learning can learn complex, contact-rich manipulation with low-cost, multi-finger hands directly in the real world, circumventing the need for precise models or simulations. It validates three dexterous tasks on two hardware platforms and demonstrates that tasks can be learned in 4–7 hours, with demonstrations reducing this to 2–3 hours via the Demonstration Augmented Policy Gradient (DAPG). The study analyzes how action spaces and reward formulations affect learning, compares real-world training to simulated transfer, and confirms the approach’s robustness across hardware and object material variations. The findings highlight the practicality of real-world, demonstration-accelerated RL for building versatile, dexterous manipulation capabilities in open-world settings, with future work aiming to incorporate vision and multi-task learning to broaden the skill repertoire.

Abstract

Dexterous multi-fingered robotic hands can perform a wide range of manipulation skills, making them an appealing component for general-purpose robotic manipulators. However, such hands pose a major challenge for autonomous control, due to the high dimensionality of their configuration space and complex intermittent contact interactions. In this work, we propose deep reinforcement learning (deep RL) as a scalable solution for learning complex, contact rich behaviors with multi-fingered hands. Deep RL provides an end-to-end approach to directly map sensor readings to actions, without the need for task specific models or policy classes. We show that contact-rich manipulation behavior with multi-fingered hands can be learned by directly training with model-free deep RL algorithms in the real world, with minimal additional assumption and without the aid of simulation. We learn a variety of complex behaviors on two different low-cost hardware platforms. We show that each task can be learned entirely from scratch, and further study how the learning process can be further accelerated by using a small number of human demonstrations to bootstrap learning. Our experiments demonstrate that complex multi-fingered manipulation skills can be learned in the real world in about 4-7 hours for most tasks, and that demonstrations can decrease this to 2-3 hours, indicating that direct deep RL training in the real world is a viable and practical alternative to simulation and model-based control. \url{https://sites.google.com/view/deeprl-handmanipulation}

Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost

TL;DR

This work shows that model-free deep reinforcement learning can learn complex, contact-rich manipulation with low-cost, multi-finger hands directly in the real world, circumventing the need for precise models or simulations. It validates three dexterous tasks on two hardware platforms and demonstrates that tasks can be learned in 4–7 hours, with demonstrations reducing this to 2–3 hours via the Demonstration Augmented Policy Gradient (DAPG). The study analyzes how action spaces and reward formulations affect learning, compares real-world training to simulated transfer, and confirms the approach’s robustness across hardware and object material variations. The findings highlight the practicality of real-world, demonstration-accelerated RL for building versatile, dexterous manipulation capabilities in open-world settings, with future work aiming to incorporate vision and multi-task learning to broaden the skill repertoire.

Abstract

Dexterous multi-fingered robotic hands can perform a wide range of manipulation skills, making them an appealing component for general-purpose robotic manipulators. However, such hands pose a major challenge for autonomous control, due to the high dimensionality of their configuration space and complex intermittent contact interactions. In this work, we propose deep reinforcement learning (deep RL) as a scalable solution for learning complex, contact rich behaviors with multi-fingered hands. Deep RL provides an end-to-end approach to directly map sensor readings to actions, without the need for task specific models or policy classes. We show that contact-rich manipulation behavior with multi-fingered hands can be learned by directly training with model-free deep RL algorithms in the real world, with minimal additional assumption and without the aid of simulation. We learn a variety of complex behaviors on two different low-cost hardware platforms. We show that each task can be learned entirely from scratch, and further study how the learning process can be further accelerated by using a small number of human demonstrations to bootstrap learning. Our experiments demonstrate that complex multi-fingered manipulation skills can be learned in the real world in about 4-7 hours for most tasks, and that demonstrations can decrease this to 2-3 hours, indicating that direct deep RL training in the real world is a viable and practical alternative to simulation and model-based control. \url{https://sites.google.com/view/deeprl-handmanipulation}

Paper Structure

This paper contains 22 sections, 8 equations, 21 figures.

Figures (21)

  • Figure 1: We demonstrate that DRL can learn a wide range of dexterous manipulation skills with multi-fingered hands, such as opening door with flexible handle, rotating a cross-shaped valve, and rotating the same valve but with a deformable foam handle, which presents an additional physical challenge, and box flipping.
  • Figure 2: Left: 3 finger Dynamixel claw. Right: 4 finger anthropomorphic Allegro hand
  • Figure 3: Illustration of valve rotation
  • Figure 4: Illustration of box flipping
  • Figure 5: Opening door with flexible handle
  • ...and 16 more figures