Lessons from Learning to Spin "Pens"

Jun Wang; Ying Yuan; Haichuan Che; Haozhi Qi; Yi Ma; Jitendra Malik; Xiaolong Wang

Lessons from Learning to Spin "Pens"

Jun Wang, Ying Yuan, Haichuan Che, Haozhi Qi, Yi Ma, Jitendra Malik, Xiaolong Wang

TL;DR

The paper tackles dexterous in-hand pen spinning, a challenging dynamic manipulation problem with a sizable sim-to-real gap. It introduces a three-stage approach: train an oracle policy with privileged simulation data to generate high-fidelity trajectories, pre-train a proprioceptive sensorimotor policy in simulation, and fine-tune with a small set of real-world trajectories via open-loop replay. Results show continuous spinning of multiple pen-like objects with fewer than 50 real trajectories, outperforming oracle replay and ablations, while simple distillation struggles. Key insights include the importance of a rich initial state design and privileged information, and the necessity of simulation-based pre-training to bridge the reality gap for such dynamic, contact-rich tasks; yet the work also acknowledges persistent sim-to-real challenges and the potential value of incorporating vision and touch for further improvement.

Abstract

In-hand manipulation of pen-like objects is an important skill in our daily lives, as many tools such as hammers and screwdrivers are similarly shaped. However, current learning-based methods struggle with this task due to a lack of high-quality demonstrations and the significant gap between simulation and the real world. In this work, we push the boundaries of learning-based in-hand manipulation systems by demonstrating the capability to spin pen-like objects. We first use reinforcement learning to train an oracle policy with privileged information and generate a high-fidelity trajectory dataset in simulation. This serves two purposes: 1) pre-training a sensorimotor policy in simulation; 2) conducting open-loop trajectory replay in the real world. We then fine-tune the sensorimotor policy using these real-world trajectories to adapt it to the real world dynamics. With less than 50 trajectories, our policy learns to rotate more than ten pen-like objects with different physical properties for multiple revolutions. We present a comprehensive analysis of our design choices and share the lessons learned during development.

Lessons from Learning to Spin "Pens"

TL;DR

Abstract

Paper Structure (17 sections, 7 figures, 9 tables)

This paper contains 17 sections, 7 figures, 9 tables.

Introduction
Related Work
Learning to Spin Pens
Oracle Policy Training
Sensorimotor Policy Pre-training
Fine-tuning Sensorimotor Policy with Oracle Replay
Experiments
Experiment Setup
Oracle Policy Training
Sensorimotor Policy Training
Qualitative Experiments
Conclusion and Lessons
Implementation Details
Training Hyper-parameters
Domain Randomization Parameters
...and 2 more sections

Figures (7)

Figure 1: Top row: Continuous rotation of a pen-like object in hand. Bottom rows: Our policy can generalize to a diverse set of pen-like objects with different physical properties, using only proprioception as feedback. More videos are available on our \web.
Figure 2: An overview of our approach. We first train an oracle policy in simulation using reinforcement learning. This policy provides high-quality trajectory and action datasets. We use this dataset to train a student policy and as an open-loop controller in the real world to collect successful real-world trajectories. Finally, we fine-tune the student policy using this real-world dataset.
Figure 3: Visualization of canonical grasp. Inspired by how humans spin pens, we design six canonical initial poses used to reset the episode. These poses are keyframes where the index, thumb, and middle fingers break and re-establish contact.
Figure 4: Learning curves for our policy and different baselines.Left: Using a well-designed initial distribution is critical. Our method samples the initial states from six proposed canonical states with noise, while Single Canonical Pose only samples near one canonical grasp. This has unstable training performance and the finger gaiting does not emerge (also see Figure \ref{['fig:zreward']} C). Right: The necessity of using visuotactile information and privileged information during oracle policy training. We train each policy with 3 seeds.
Figure 5: Importance of $r_z$ and initial state design. (a) Our policy spins the pen in a smooth and stable manner, with the pen mostly horizontal. (b) Policies trained without the $r_z$ tend to make the pen more tilted during rotation. This behavior is unstable and cannot be used as an open-loop controller in the real world. (c) Initializing with a single canonical state lacks exploration and cannot learn finger gaiting.
...and 2 more figures

Lessons from Learning to Spin "Pens"

TL;DR

Abstract

Lessons from Learning to Spin "Pens"

Authors

TL;DR

Abstract

Table of Contents

Figures (7)