Table of Contents
Fetching ...

DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model

Xueyi Liu, He Wang, Li Yi

TL;DR

DexNDM tackles the sim-to-real gap in dexterous in-hand rotation by decoupling system dynamics into per-joint components, enabling data-efficient learning and broad object generalization. It combines a joint-wise neural dynamics model with an autonomous data-collection scheme and a residual policy that bridges remaining real-world discrepancies, trained atop a generalist policy obtained via behavior cloning from category-specific experts. The approach yields strong sim-to-real transfer, enabling rotation of high‑aspect-ratio, small, and complex objects across multiple wrist orientations, and supports teleoperation for complex tasks. This work advances practical dexterous manipulation by delivering a single policy capable of broad object handling with minimal human intervention in data collection, offering substantial impact for real-world robotic manipulation and embodied intelligence.

Abstract

Achieving generalized in-hand object rotation remains a significant challenge in robotics, largely due to the difficulty of transferring policies from simulation to the real world. The complex, contact-rich dynamics of dexterous manipulation create a "reality gap" that has limited prior work to constrained scenarios involving simple geometries, limited object sizes and aspect ratios, constrained wrist poses, or customized hands. We address this sim-to-real challenge with a novel framework that enables a single policy, trained in simulation, to generalize to a wide variety of objects and conditions in the real world. The core of our method is a joint-wise dynamics model that learns to bridge the reality gap by effectively fitting limited amount of real-world collected data and then adapting the sim policy's actions accordingly. The model is highly data-efficient and generalizable across different whole-hand interaction distributions by factorizing dynamics across joints, compressing system-wide influences into low-dimensional variables, and learning each joint's evolution from its own dynamic profile, implicitly capturing these net effects. We pair this with a fully autonomous data collection strategy that gathers diverse, real-world interaction data with minimal human intervention. Our complete pipeline demonstrates unprecedented generality: a single policy successfully rotates challenging objects with complex shapes (e.g., animals), high aspect ratios (up to 5.33), and small sizes, all while handling diverse wrist orientations and rotation axes. Comprehensive real-world evaluations and a teleoperation application for complex tasks validate the effectiveness and robustness of our approach. Website: https://meowuu7.github.io/DexNDM/

DexNDM: Closing the Reality Gap for Dexterous In-Hand Rotation via Joint-Wise Neural Dynamics Model

TL;DR

DexNDM tackles the sim-to-real gap in dexterous in-hand rotation by decoupling system dynamics into per-joint components, enabling data-efficient learning and broad object generalization. It combines a joint-wise neural dynamics model with an autonomous data-collection scheme and a residual policy that bridges remaining real-world discrepancies, trained atop a generalist policy obtained via behavior cloning from category-specific experts. The approach yields strong sim-to-real transfer, enabling rotation of high‑aspect-ratio, small, and complex objects across multiple wrist orientations, and supports teleoperation for complex tasks. This work advances practical dexterous manipulation by delivering a single policy capable of broad object handling with minimal human intervention in data collection, offering substantial impact for real-world robotic manipulation and embodied intelligence.

Abstract

Achieving generalized in-hand object rotation remains a significant challenge in robotics, largely due to the difficulty of transferring policies from simulation to the real world. The complex, contact-rich dynamics of dexterous manipulation create a "reality gap" that has limited prior work to constrained scenarios involving simple geometries, limited object sizes and aspect ratios, constrained wrist poses, or customized hands. We address this sim-to-real challenge with a novel framework that enables a single policy, trained in simulation, to generalize to a wide variety of objects and conditions in the real world. The core of our method is a joint-wise dynamics model that learns to bridge the reality gap by effectively fitting limited amount of real-world collected data and then adapting the sim policy's actions accordingly. The model is highly data-efficient and generalizable across different whole-hand interaction distributions by factorizing dynamics across joints, compressing system-wide influences into low-dimensional variables, and learning each joint's evolution from its own dynamic profile, implicitly capturing these net effects. We pair this with a fully autonomous data collection strategy that gathers diverse, real-world interaction data with minimal human intervention. Our complete pipeline demonstrates unprecedented generality: a single policy successfully rotates challenging objects with complex shapes (e.g., animals), high aspect ratios (up to 5.33), and small sizes, all while handling diverse wrist orientations and rotation axes. Comprehensive real-world evaluations and a teleoperation application for complex tasks validate the effectiveness and robustness of our approach. Website: https://meowuu7.github.io/DexNDM/

Paper Structure

This paper contains 25 sections, 5 theorems, 33 equations, 43 figures, 11 tables.

Key Result

Theorem 3.1

Let $(X,Y)\in\mathbb{R}^n\times\mathbb{R}$ and $g(X,Y)=(g_X(X),Y)$ with $g_X:\mathbb{R}^n\to\mathbb{R}^m$, $m<n$. Let $\mathcal{P},\mathcal{Q}$ be distributions on $(X,Y)$ satisfying covariate shift, i.e., $\mathcal{P}(Y\mid X)=\mathcal{Q}(Y\mid X)$. Let $L$ be a loss bounded by $B$, and define $R_{

Figures (43)

  • Figure 1: We introduce \websitehttps, a sim-to-real approach that enables unprecedented in-hand rotation in the real world. We master a wide object distribution, including (A) challenging geometries and (B) complex shapes, across (C) rich wrist orientations. (D) A teleoperation application. Videos in \websitehttps.
  • Figure 2: Learning from Real-World Data for Control. (A) Learn a whole-body dynamics model from real-world data for policy tuning or model-based control. (B) Learn a residual action model to finetune a base policy. (C) Learn joint-wise dynamics and a residual policy to adapt the base policy.
  • Figure 3: Method Overview.(A) RL-train object category-specific rotation specialists. (B) Distill them into a single generalist via BC. (C-E) Neural sim-to-real: autonomously collect real-world transitions with random loads (C), learn a joint-wise neural dynamics model (D), and train a residual to bridge the reality gap (E). Deploy the base generalist (B) augmented with the residual (E).
  • Figure 4: State-Action History Distribution.
  • Figure 5: Objects for Real Experiment.
  • ...and 38 more figures

Theorems & Definitions (10)

  • Claim 3.1
  • Theorem 3.1: Generalization Gap Contraction
  • Theorem A.1: Data Processing Inequality for KL (strict form)
  • Proof A.1
  • Theorem A.2: Generalization Gap Contraction
  • Proof A.2
  • Proposition
  • Proof A.3
  • Theorem A.3
  • Proof A.4