Table of Contents
Fetching ...

MoDex: Planning High-Dimensional Dexterous Control via Learning Neural Internal Models

Tong Wu, Shoujie Li, Chuqiao Lyu, Kit-Wa Sou, Wang-Sing Chan, Wenbo Ding

TL;DR

MoDex tackles high-dimensional dexterous control by learning neural internal models of hand dynamics (forward and inverse) and pairing them with a bidirectional planning loop based on Cross-Entropy Method. The framework supports data-efficient planning, a factorized dynamics variant for in-hand manipulation, and few-shot gesture generation via LLM-generated costs, validated across multiple hands in simulation and real-world deployment. Key contributions include the neural internal-model formulation, bidirectional planning, factorized dynamics, and LLM-assisted gesture generation, demonstrating improved data efficiency and transfer across tasks. This work advances scalable, high-DoF dexterous control with practical impact on manipulation and gesture synthesis in both simulated and real environments.

Abstract

Controlling hands in high-dimensional action space has been a longstanding challenge, yet humans naturally perform dexterous tasks with ease. In this paper, we draw inspiration from the concept of internal model exhibited in human behavior and reconsider dexterous hands as learnable systems. Specifically, we introduce MoDex, a framework that includes a couple of neural networks (NNs) capturing the dynamical characteristics of hands and a bidirectional planning approach, which demonstrates both training and planning efficiency. To show the versatility of MoDex, we further integrate it with an external model to manipulate in-hand objects and a large language model (LLM) to generate various gestures in both simulation and real world. Extensive experiments on different dexterous hands address the data efficiency in learning a new task and the transferability between different tasks.

MoDex: Planning High-Dimensional Dexterous Control via Learning Neural Internal Models

TL;DR

MoDex tackles high-dimensional dexterous control by learning neural internal models of hand dynamics (forward and inverse) and pairing them with a bidirectional planning loop based on Cross-Entropy Method. The framework supports data-efficient planning, a factorized dynamics variant for in-hand manipulation, and few-shot gesture generation via LLM-generated costs, validated across multiple hands in simulation and real-world deployment. Key contributions include the neural internal-model formulation, bidirectional planning, factorized dynamics, and LLM-assisted gesture generation, demonstrating improved data efficiency and transfer across tasks. This work advances scalable, high-DoF dexterous control with practical impact on manipulation and gesture synthesis in both simulated and real environments.

Abstract

Controlling hands in high-dimensional action space has been a longstanding challenge, yet humans naturally perform dexterous tasks with ease. In this paper, we draw inspiration from the concept of internal model exhibited in human behavior and reconsider dexterous hands as learnable systems. Specifically, we introduce MoDex, a framework that includes a couple of neural networks (NNs) capturing the dynamical characteristics of hands and a bidirectional planning approach, which demonstrates both training and planning efficiency. To show the versatility of MoDex, we further integrate it with an external model to manipulate in-hand objects and a large language model (LLM) to generate various gestures in both simulation and real world. Extensive experiments on different dexterous hands address the data efficiency in learning a new task and the transferability between different tasks.
Paper Structure (35 sections, 9 equations, 16 figures, 6 tables, 1 algorithm)

This paper contains 35 sections, 9 equations, 16 figures, 6 tables, 1 algorithm.

Figures (16)

  • Figure 1: MoDex. Our framework learns neural internal models to represent various dexterous hands. MoDex enables precise control in high-dimensional action space with an internal model, the generation of diverse gestures by integrating the hand model with the LLM, and data-efficient in-hand manipulation via learning factorized dynamics model.
  • Figure 2: Method overview. We first explore the action space and collect dynamics data to train the internal model. Using these models, we employ CEM-based bidirectional planning to optimize actions. Applications. We decompose the system dynamics into a hand model and an external model, enhancing learning efficiency in in-hand manipulation task. Additionally, we leverage LLM to generate a cost function from textual inputs, guiding action optimization for gesture generation.
  • Figure 3: Ablation studies.(a) Ablation on forward model. We find that to maintain the same prediction error the training data grows exponentially with the action dimension. (b) Ablation on planning method. Results demonstrate the inverse process of MoDex can significantly reduce planning samples and iterations.
  • Figure 4: Application cases.(a) In-hand manipulation with MyoHand. (b–d) Quantitative evaluation on three in-hand manipulation tasks shows that MoDex achieves the highest success rates with the fewest data. (e) Gesture generation results: given two example gestures, MoDex generates four novel gestures using the LLM.
  • Figure 5: Real-World deployment.(a) We deploy MoDex to perform in-hand rotation of three objects (cube, orange, and cylinder) around the $z$-axis. (b) MoDex generates diverse hand gestures in the real world. Green indicates successful executions, while red denotes failures. We observe that failures arise when the target gestures fall outside the distribution of the exploration data.
  • ...and 11 more figures