Lessons from Learning to Spin "Pens"
Jun Wang, Ying Yuan, Haichuan Che, Haozhi Qi, Yi Ma, Jitendra Malik, Xiaolong Wang
TL;DR
The paper tackles dexterous in-hand pen spinning, a challenging dynamic manipulation problem with a sizable sim-to-real gap. It introduces a three-stage approach: train an oracle policy with privileged simulation data to generate high-fidelity trajectories, pre-train a proprioceptive sensorimotor policy in simulation, and fine-tune with a small set of real-world trajectories via open-loop replay. Results show continuous spinning of multiple pen-like objects with fewer than 50 real trajectories, outperforming oracle replay and ablations, while simple distillation struggles. Key insights include the importance of a rich initial state design and privileged information, and the necessity of simulation-based pre-training to bridge the reality gap for such dynamic, contact-rich tasks; yet the work also acknowledges persistent sim-to-real challenges and the potential value of incorporating vision and touch for further improvement.
Abstract
In-hand manipulation of pen-like objects is an important skill in our daily lives, as many tools such as hammers and screwdrivers are similarly shaped. However, current learning-based methods struggle with this task due to a lack of high-quality demonstrations and the significant gap between simulation and the real world. In this work, we push the boundaries of learning-based in-hand manipulation systems by demonstrating the capability to spin pen-like objects. We first use reinforcement learning to train an oracle policy with privileged information and generate a high-fidelity trajectory dataset in simulation. This serves two purposes: 1) pre-training a sensorimotor policy in simulation; 2) conducting open-loop trajectory replay in the real world. We then fine-tune the sensorimotor policy using these real-world trajectories to adapt it to the real world dynamics. With less than 50 trajectories, our policy learns to rotate more than ten pen-like objects with different physical properties for multiple revolutions. We present a comprehensive analysis of our design choices and share the lessons learned during development.
