Twisting Lids Off with Two Hands

Toru Lin; Zhao-Heng Yin; Haozhi Qi; Pieter Abbeel; Jitendra Malik

Twisting Lids Off with Two Hands

Toru Lin, Zhao-Heng Yin, Haozhi Qi, Pieter Abbeel, Jitendra Malik

TL;DR

The paper tackles the challenge of dexterous bimanual manipulation by training a two-handed lid-twisting policy in simulation and transferring it zero-shot to real robots. It introduces a fast brake-based friction model, a minimal sparse perception pipeline, and a keypoint-based contact reward, combined with domain randomization and asymmetric PPO, to achieve robust sim-to-real transfer. Real-world experiments with dual Allegro Hands demonstrate generalization across seen and unseen bottle objects, resilience to perturbations, and partial lid-removal on novel shapes. Overall, the work advances practical two-handed manipulation by enabling generalizable, high-precision contact-rich policies without relying on real-world expert demonstrations.

Abstract

Manipulating objects with two multi-fingered hands has been a long-standing challenge in robotics, due to the contact-rich nature of many manipulation tasks and the complexity inherent in coordinating a high-dimensional bimanual system. In this work, we share novel insights into physical modeling, real-time perception, and reward design that enable policies trained in simulation using deep reinforcement learning (RL) to be effectively and efficiently transferred to the real world. Specifically, we consider the problem of twisting lids of various bottle-like objects with two hands, demonstrating policies with generalization capabilities across a diverse set of unseen objects as well as dynamic and dexterous behaviors. To the best of our knowledge, this is the first sim-to-real RL system that enables such capabilities on bimanual multi-fingered hands.

Twisting Lids Off with Two Hands

TL;DR

Abstract

Paper Structure (23 sections, 9 figures, 5 tables)

This paper contains 23 sections, 9 figures, 5 tables.

Introduction
Background
Learning to Twist Lids with Two Hands
Object Simulation
Task Initialization
Policy Learning
Real-World Perception.
Simulation Experiments
Setup
Results
Real-world Experiments
Experiment Setup
Twisting Lids in the Real World
Robustness against Perturbation
Exploration of Twisting Lids Off
...and 8 more sections

Figures (9)

Figure 1: We train two anthropomorphic robot hands to twist (off) lids of various articulated objects. The control policy is first trained in simulation with deep reinforcement learning, then zero-shot transferred to a real-world setup. We show that a single policy trained to manipulate simplistic, simulated bottle-like objects can generalize to real-world objects that have drastically different physical properties (e.g. shape, size, color, material, mass). The length, diameter (or diagonal length), and mass of each object are annotated at the bottom of individual subfigures. More results can be found in our https://youtu.be/TaRva6UjrCo and and https://toruowo.github.io/bimanual-twist.
Figure 2: Our bottle model and the used bottles in the simulation and the real world. A: Simulated bottle URDF. B: Training bottle objects in simulation. C: Custom-made bottles (in-distribution except for the rightmost square bottle). D: Household object bottles (out-of-distribution).
Figure 3: Left: Real-time perception system. Top: overview. Bottom: we segment and track object parts from the RGB frames (left), take mask centers as object part centers (middle), and estimate 3D object keypoints using noisy depth information from the camera (right). Right: Illustration of reward design. Our task-specific reward contains finger contact reward (yellow arrows), twisting reward (white arrow), and pose reward (blue arrow). In particular, our keypoint-based finger contact reward is crucial for learning the desired behavior.
Figure 4: Training curves in different settings. Top: Single-object training results (evaluated on single-object setup). Bottom: Multi-object training results (evaluated on multi-object setup). Left half: Comparisons of different reward setups. Right half: Ablations on the use of vision. The results are averaged on 5 seeds. The shaded area shows the standard deviation. The AD score is averaged by the total execution steps.
Figure 5: Left: Behavior of different reward functions. Top: Our full reward function achieves a stable grasp, as well as a smooth, natural, and human-like twisting motion. Middle: A naive gait constraint reward without any contact hints leads to erratic finger motion and unnatural grasps. Bottom: A reduced contact reward yields somewhat natural behavior, but the grasp is loose compared to the full contact reward case. Right: Perturbing a learned policy with random external force. Our policy is resilient to these external forces and able to recover.
...and 4 more figures

Twisting Lids Off with Two Hands

TL;DR

Abstract

Twisting Lids Off with Two Hands

Authors

TL;DR

Abstract

Table of Contents

Figures (9)