DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality

Ankur Handa; Arthur Allshire; Viktor Makoviychuk; Aleksei Petrenko; Ritvik Singh; Jingzhou Liu; Denys Makoviichuk; Karl Van Wyk; Alexander Zhurkevich; Balakumar Sundaralingam; Yashraj Narang; Jean-Francois Lafleche; Dieter Fox; Gavriel State

DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality

Ankur Handa, Arthur Allshire, Viktor Makoviychuk, Aleksei Petrenko, Ritvik Singh, Jingzhou Liu, Denys Makoviichuk, Karl Van Wyk, Alexander Zhurkevich, Balakumar Sundaralingam, Yashraj Narang, Jean-Francois Lafleche, Dieter Fox, Gavriel State

TL;DR

This work tackles sim-to-real transfer for dexterous in-hand manipulation by training a vision-based policy on a low-cost Allegro Hand using GPU-accelerated Isaac Gym. It combines PPO with an LSTM policy and a robust pose estimator, leveraging Vectorised Automatic Domain Randomisation (VADR) and physics/non-physics randomisations to bridge the sim-to-real gap. The results show strong real-world performance, with ADR-trained policies outperforming prior vision baselines and achieving up to 112 consecutive successes, demonstrating practical, accessible approaches for dexterous manipulation. The work emphasizes reproducibility and accessibility, using affordable hardware and open pipelines to accelerate progress in sim-to-real robotics.

Abstract

Recent work has demonstrated the ability of deep reinforcement learning (RL) algorithms to learn complex robotic behaviours in simulation, including in the domain of multi-fingered manipulation. However, such models can be challenging to transfer to the real world due to the gap between simulation and reality. In this paper, we present our techniques to train a) a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand and b) a robust pose estimator suitable for providing reliable real-time information on the state of the object being manipulated. Our policies are trained to adapt to a wide range of conditions in simulation. Consequently, our vision-based policies significantly outperform the best vision policies in the literature on the same reorientation task and are competitive with policies that are given privileged state information via motion capture systems. Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups, and in our case, with the Allegro Hand and Isaac Gym GPU-based simulation. Furthermore, it opens up possibilities for researchers to achieve such results with commonly-available, affordable robot hands and cameras. Videos of the resulting policy and supplementary information, including experiments and demos, can be found at https://dextreme.org/

DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality

TL;DR

Abstract

Paper Structure (28 sections, 2 equations, 10 figures, 14 tables, 2 algorithms)

This paper contains 28 sections, 2 equations, 10 figures, 14 tables, 2 algorithms.

Introduction
Method
Task
Hardware
Policy Learning with RL
Reward Formulation
Simulation
Domain Randomisation
Physics Randomisations
Non-physics Randomisations
Measuring ADR Performance in Training and in the Real World
Pose Estimation
Results
Training in Simulation
Real-World Policy Performance
...and 13 more sections

Figures (10)

Figure 1: The DeXtreme system using an Allegro Hand in action in the real world.
Figure 2: The hardware setup used in this work, unlike openai-sh, is not housed in a cage, and our system is robust enough to perform the task in an open laboratory environment. The background in the image is alpha-blended for visibility.
Figure 3: High level overview of the training and inference systems.
Figure 4: Parameter range adjustments, $p^{\texttt{i\_lo}}$ and $p^{\texttt{i\_hi}}$, with ADR based on the performance of policy at the boundaries $Q^{\texttt{i\_lo}}$ and $Q^{\texttt{i\_hi}}$ with respect to the thresholds $t_l$ and $t_h$. If the mean performance at the boundary $Q^{\texttt{i\_lo}}$ is less than threshold $t_l$, the range is tightened (a) and if it is above a threshold $t_h$, the range is expanded (c). Similarly, if the mean performance at the boundary $Q^{\texttt{i\_hi}}$ is above a threshold $t_h$, the range is expanded (b) and if it is lower than $t_l$, the range is tightened (d).
Figure 5: The functioning of the Random Network Adversary
...and 5 more figures

DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality

TL;DR

Abstract

DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to Reality

Authors

TL;DR

Abstract

Table of Contents

Figures (10)