ScrewMimic: Bimanual Imitation from Human Videos with Screw Space Projection
Arpit Bahety, Priyanka Mandikal, Ben Abbatematteo, Roberto Martín-Martín
TL;DR
ScrewMimic tackles bimanual manipulation by projecting human demonstrations into a screw-axis space that jointly constrains both hands. It introduces screw actions $\sigma=(g_l,g_r,S,\tau_l)$ and leverages a three-module pipeline—perception of demonstrations, point-cloud-based screw-action prediction, and self-supervised CEM-based fine-tuning—to learn from a single video. Empirical results across six real tasks show high success rates and improved generalization, with ablations confirming the benefits of the screw representation and autonomous reward signals. This approach offers a practical, scalable path for robots to acquire complex coordinated manipulation skills from human videos with minimal human supervision.
Abstract
Bimanual manipulation is a longstanding challenge in robotics due to the large number of degrees of freedom and the strict spatial and temporal synchronization required to generate meaningful behavior. Humans learn bimanual manipulation skills by watching other humans and by refining their abilities through play. In this work, we aim to enable robots to learn bimanual manipulation behaviors from human video demonstrations and fine-tune them through interaction. Inspired by seminal work in psychology and biomechanics, we propose modeling the interaction between two hands as a serial kinematic linkage -- as a screw motion, in particular, that we use to define a new action space for bimanual manipulation: screw actions. We introduce ScrewMimic, a framework that leverages this novel action representation to facilitate learning from human demonstration and self-supervised policy fine-tuning. Our experiments demonstrate that ScrewMimic is able to learn several complex bimanual behaviors from a single human video demonstration, and that it outperforms baselines that interpret demonstrations and fine-tune directly in the original space of motion of both arms. For more information and video results, https://robin-lab.cs.utexas.edu/ScrewMimic/
