Visual Dexterity: In-Hand Reorientation of Novel and Complex Object Shapes
Tao Chen, Megha Tippur, Siyang Wu, Vikash Kumar, Edward Adelson, Pulkit Agrawal
TL;DR
The paper introduces a real-time, vision-based in-hand object reorientation controller that can handle novel and complex shapes using a single depth camera. It trains a teacher policy in simulation with privileged information and transfers to the real world via a two-stage student learning framework that accelerates vision-based policy learning through synthetic pretraining and rendered-depth finetuning. The approach achieves dynamic reorientation in full SO(3) space, including challenging in-air scenarios, using open-source hardware under $5k and soft fingertips to improve generalization and robustness. While capable of reorienting many objects quickly, the study also highlights remaining gaps in precision and drop rates for out-of-distribution shapes, pointing to avenues for improving stopping accuracy and contact sensing. Overall, the work demonstrates a practical, low-cost path toward democratizing dexterous manipulation research with robust sim-to-real transfer and real-time perception-driven control.
Abstract
In-hand object reorientation is necessary for performing many dexterous manipulation tasks, such as tool use in less structured environments that remain beyond the reach of current robots. Prior works built reorientation systems assuming one or many of the following: reorienting only specific objects with simple shapes, limited range of reorientation, slow or quasistatic manipulation, simulation-only results, the need for specialized and costly sensor suites, and other constraints which make the system infeasible for real-world deployment. We present a general object reorientation controller that does not make these assumptions. It uses readings from a single commodity depth camera to dynamically reorient complex and new object shapes by any rotation in real-time, with the median reorientation time being close to seven seconds. The controller is trained using reinforcement learning in simulation and evaluated in the real world on new object shapes not used for training, including the most challenging scenario of reorienting objects held in the air by a downward-facing hand that must counteract gravity during reorientation. Our hardware platform only uses open-source components that cost less than five thousand dollars. Although we demonstrate the ability to overcome assumptions in prior work, there is ample scope for improving absolute performance. For instance, the challenging duck-shaped object not used for training was dropped in 56 percent of the trials. When it was not dropped, our controller reoriented the object within 0.4 radians (23 degrees) 75 percent of the time. Videos are available at: https://taochenshh.github.io/projects/visual-dexterity.
