MoDex: Planning High-Dimensional Dexterous Control via Learning Neural Internal Models

Tong Wu; Shoujie Li; Chuqiao Lyu; Kit-Wa Sou; Wang-Sing Chan; Wenbo Ding

MoDex: Planning High-Dimensional Dexterous Control via Learning Neural Internal Models

Tong Wu, Shoujie Li, Chuqiao Lyu, Kit-Wa Sou, Wang-Sing Chan, Wenbo Ding

TL;DR

MoDex tackles high-dimensional dexterous control by learning neural internal models of hand dynamics (forward and inverse) and pairing them with a bidirectional planning loop based on Cross-Entropy Method. The framework supports data-efficient planning, a factorized dynamics variant for in-hand manipulation, and few-shot gesture generation via LLM-generated costs, validated across multiple hands in simulation and real-world deployment. Key contributions include the neural internal-model formulation, bidirectional planning, factorized dynamics, and LLM-assisted gesture generation, demonstrating improved data efficiency and transfer across tasks. This work advances scalable, high-DoF dexterous control with practical impact on manipulation and gesture synthesis in both simulated and real environments.

Abstract

Controlling hands in high-dimensional action space has been a longstanding challenge, yet humans naturally perform dexterous tasks with ease. In this paper, we draw inspiration from the concept of internal model exhibited in human behavior and reconsider dexterous hands as learnable systems. Specifically, we introduce MoDex, a framework that includes a couple of neural networks (NNs) capturing the dynamical characteristics of hands and a bidirectional planning approach, which demonstrates both training and planning efficiency. To show the versatility of MoDex, we further integrate it with an external model to manipulate in-hand objects and a large language model (LLM) to generate various gestures in both simulation and real world. Extensive experiments on different dexterous hands address the data efficiency in learning a new task and the transferability between different tasks.

MoDex: Planning High-Dimensional Dexterous Control via Learning Neural Internal Models

TL;DR

Abstract

Paper Structure (35 sections, 9 equations, 16 figures, 6 tables, 1 algorithm)

This paper contains 35 sections, 9 equations, 16 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Internal Model
Dexterous Manipulation
Method
Learning Neural Internal Models
Forward Model for Dexterous Hand Dynamics
Inverse Model for Action Generation
Bidirectional Planning Strategy
Applications of Pretrained Internal Model
Factorized Dynamics Learning
Few-shot Gesture Generation
Experiments
Fingertip Reach Experiment
Ablation Studies
...and 20 more sections

Figures (16)

Figure 1: MoDex. Our framework learns neural internal models to represent various dexterous hands. MoDex enables precise control in high-dimensional action space with an internal model, the generation of diverse gestures by integrating the hand model with the LLM, and data-efficient in-hand manipulation via learning factorized dynamics model.
Figure 2: Method overview. We first explore the action space and collect dynamics data to train the internal model. Using these models, we employ CEM-based bidirectional planning to optimize actions. Applications. We decompose the system dynamics into a hand model and an external model, enhancing learning efficiency in in-hand manipulation task. Additionally, we leverage LLM to generate a cost function from textual inputs, guiding action optimization for gesture generation.
Figure 3: Ablation studies.(a) Ablation on forward model. We find that to maintain the same prediction error the training data grows exponentially with the action dimension. (b) Ablation on planning method. Results demonstrate the inverse process of MoDex can significantly reduce planning samples and iterations.
Figure 4: Application cases.(a) In-hand manipulation with MyoHand. (b–d) Quantitative evaluation on three in-hand manipulation tasks shows that MoDex achieves the highest success rates with the fewest data. (e) Gesture generation results: given two example gestures, MoDex generates four novel gestures using the LLM.
Figure 5: Real-World deployment.(a) We deploy MoDex to perform in-hand rotation of three objects (cube, orange, and cylinder) around the $z$-axis. (b) MoDex generates diverse hand gestures in the real world. Green indicates successful executions, while red denotes failures. We observe that failures arise when the target gestures fall outside the distribution of the exploration data.
...and 11 more figures

MoDex: Planning High-Dimensional Dexterous Control via Learning Neural Internal Models

TL;DR

Abstract

MoDex: Planning High-Dimensional Dexterous Control via Learning Neural Internal Models

Authors

TL;DR

Abstract

Table of Contents

Figures (16)