Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning

Patrick Yin; Tyler Westenbroek; Zhengyu Zhang; Joshua Tran; Ignacio Dagnino; Eeshani Shilamkar; Numfor Mbiziwo-Tiapo; Simran Bagaria; Xinlei Liu; Galen Mullins; Andrey Kolobov; Abhishek Gupta

Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning

Patrick Yin, Tyler Westenbroek, Zhengyu Zhang, Joshua Tran, Ignacio Dagnino, Eeshani Shilamkar, Numfor Mbiziwo-Tiapo, Simran Bagaria, Xinlei Liu, Galen Mullins, Andrey Kolobov, Abhishek Gupta

Abstract

Reinforcement learning in massively parallel physics simulations has driven major progress in sim-to-real robot learning. However, current approaches remain brittle and task-specific, relying on extensive per-task engineering to design rewards, curricula, and demonstrations. Even with this engineering, they often fail on long-horizon, contact-rich manipulation tasks and do not meaningfully scale with compute, as performance quickly saturates when training revisits the same narrow regions of state space. We introduce \Method, a simple and scalable framework that enables on-policy reinforcement learning to robustly solve a broad class of dexterous manipulation tasks using a single reward function, fixed algorithm hyperparameters, no curricula, and no human demonstrations. Our key insight is that long-horizon exploration can be dramatically simplified by using simulator resets to systematically expose the RL algorithm to the diverse set of robot-object interactions which underlie dexterous manipulation. \Method\ programmatically generates such resets with minimal human input, converting additional compute directly into broader behavioral coverage and continued performance gains. We show that \Method\ gracefully scales to long-horizon dexterous manipulation tasks beyond the capabilities of existing approaches and is able to learn robust policies over significantly wider ranges of initial conditions than baselines. Finally, we distill \Method \ into visuomotor policies which display robust retrying behavior and substantially higher success rates than baselines when transferred to the real world zero-shot. Project webpage: https://omnireset.github.io

Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning

Abstract

Paper Structure (13 sections, 1 equation, 10 figures)

This paper contains 13 sections, 1 equation, 10 figures.

Introduction
Related Work
Generating Diverse Resets for Learning Dexterous Manipulation
Problem Setting
Automatically Generating RL Problems with Diverse Resets
Algorithmic Decisions for RL Training
Simulation Experiments
Task Descriptions
Simulation Expert Baseline Comparisons
Ablating Key Design Decisions
Distillation and Real-World Transfer
Conclusion
Reproducibility Statement

Figures (10)

Figure 1: OmniReset automatically generates diverse reset states, enabling complex, multi-phase manipulation behaviors to emerge from large-scale reinforcement learning. Using the same generic reset procedure for each task, OmniReset learns robust, task-specific policies. The first row shows the robot pushing and flipping a drawer before wiggling it in. The second row shows the robot picking up a table leg, adjusting its grip using the table, and twisting the leg into the hole. The final two rows show peg insertion in simulation and a distilled RGB policy attempting the task on a real robot, which recovers from a failed insertion to successfully complete it, demonstrating robust emergent retrying behavior.
Figure 2: Sim-to-Real Pipeline with OmniReset (1) After generating partial assemblies and grasps from the simulator, (2) we collect reset states: reaching, near object, grasped, and near goal. (3) We then train a state-based RL policy initialized from these reset states, which is used to (4) train student-teacher distillation to get a RGB policy. (5) By finetuning this RGB policy on a mix of simulation data and small set of real demonstrations, (6) we deploy the RGB-based policy in the real world.
Figure 3: Additional tasks. Visualizations of the new manipulation tasks solved with OmniReset. From left to right: Four-Leg Table Assembly, Cube Stacking, Cupcake on Plate, Block Reorientation on Wall
Figure 4: Success Rates on Different Stages of Task. We plot the success rates for the Leg Twisting task when starting from states that are in the Near-Goal region and also the Reaching Region of the state space. When evaluating these success rates, we sample resets from the demonstrations used by the baseline algorithms to ensure the resulting policies start from in-distribution states. We see that the baselines can achieve moderate success rates when starting close to the goal (Near Goal), but struggle to make meaningful progress on the full long-horizon task (captured by the reaching resets).
Figure 5: Ablation on grasp sampling range. For this ablation on the screwing task, we find that training RL on narrower grasp sampling ranges leads to worse sample efficiency and lower converged success rate.
...and 5 more figures

Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning

Abstract

Emergent Dexterity via Diverse Resets and Large-Scale Reinforcement Learning

Authors

Abstract

Table of Contents

Figures (10)