Dexterous Manipulation with Deep Reinforcement Learning: Efficient, General, and Low-Cost
Henry Zhu, Abhishek Gupta, Aravind Rajeswaran, Sergey Levine, Vikash Kumar
TL;DR
This work shows that model-free deep reinforcement learning can learn complex, contact-rich manipulation with low-cost, multi-finger hands directly in the real world, circumventing the need for precise models or simulations. It validates three dexterous tasks on two hardware platforms and demonstrates that tasks can be learned in 4–7 hours, with demonstrations reducing this to 2–3 hours via the Demonstration Augmented Policy Gradient (DAPG). The study analyzes how action spaces and reward formulations affect learning, compares real-world training to simulated transfer, and confirms the approach’s robustness across hardware and object material variations. The findings highlight the practicality of real-world, demonstration-accelerated RL for building versatile, dexterous manipulation capabilities in open-world settings, with future work aiming to incorporate vision and multi-task learning to broaden the skill repertoire.
Abstract
Dexterous multi-fingered robotic hands can perform a wide range of manipulation skills, making them an appealing component for general-purpose robotic manipulators. However, such hands pose a major challenge for autonomous control, due to the high dimensionality of their configuration space and complex intermittent contact interactions. In this work, we propose deep reinforcement learning (deep RL) as a scalable solution for learning complex, contact rich behaviors with multi-fingered hands. Deep RL provides an end-to-end approach to directly map sensor readings to actions, without the need for task specific models or policy classes. We show that contact-rich manipulation behavior with multi-fingered hands can be learned by directly training with model-free deep RL algorithms in the real world, with minimal additional assumption and without the aid of simulation. We learn a variety of complex behaviors on two different low-cost hardware platforms. We show that each task can be learned entirely from scratch, and further study how the learning process can be further accelerated by using a small number of human demonstrations to bootstrap learning. Our experiments demonstrate that complex multi-fingered manipulation skills can be learned in the real world in about 4-7 hours for most tasks, and that demonstrations can decrease this to 2-3 hours, indicating that direct deep RL training in the real world is a viable and practical alternative to simulation and model-based control. \url{https://sites.google.com/view/deeprl-handmanipulation}
