DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model
Xueyi Liu, He Wang, Li Yi
TL;DR
DexNDM tackles the sim-to-real gap in dexterous in-hand rotation by decoupling system dynamics into per-joint components, enabling data-efficient learning and broad object generalization. It combines a joint-wise neural dynamics model with an autonomous data-collection scheme and a residual policy that bridges remaining real-world discrepancies, trained atop a generalist policy obtained via behavior cloning from category-specific experts. The approach yields strong sim-to-real transfer, enabling rotation of high‑aspect-ratio, small, and complex objects across multiple wrist orientations, and supports teleoperation for complex tasks. This work advances practical dexterous manipulation by delivering a single policy capable of broad object handling with minimal human intervention in data collection, offering substantial impact for real-world robotic manipulation and embodied intelligence.
Abstract
Achieving generalized in-hand object rotation remains a significant challenge in robotics, largely due to the difficulty of transferring policies from simulation to the real world. The complex, contact-rich dynamics of dexterous manipulation create a "reality gap" that has limited prior work to constrained scenarios involving simple geometries, limited object sizes and aspect ratios, constrained wrist poses, or customized hands. We address this sim-to-real challenge with a novel framework that enables a single policy, trained in simulation, to generalize to a wide variety of objects and conditions in the real world. The core of our method is a joint-wise dynamics model that learns to bridge the reality gap by effectively fitting limited amount of real-world collected data and then adapting the sim policy's actions accordingly. The model is highly data-efficient and generalizable across different whole-hand interaction distributions by factorizing dynamics across joints, compressing system-wide influences into low-dimensional variables, and learning each joint's evolution from its own dynamic profile, implicitly capturing these net effects. We pair this with a fully autonomous data collection strategy that gathers diverse, real-world interaction data with minimal human intervention. Our complete pipeline demonstrates unprecedented generality: a single policy successfully rotates challenging objects with complex shapes (e.g., animals), high aspect ratios (up to 5.33), and small sizes, all while handling diverse wrist orientations and rotation axes. Comprehensive real-world evaluations and a teleoperation application for complex tasks validate the effectiveness and robustness of our approach. Website: https://meowuu7.github.io/DexNDM/
