Modality-Driven Design for Multi-Step Dexterous Manipulation: Insights from Neuroscience
Naoki Wake, Atsushi Kanehira, Daichi Saito, Jun Takamatsu, Kazuhiro Sasabuchi, Hideki Koike, Katsushi Ikeuchi
TL;DR
This work tackles multi-step dexterous manipulation by proposing a neuroscience-inspired, modality-driven decomposition into reaching, grasping and lifting, and in-hand rotation. Each sub-task is addressed with a modality-appropriate method: vision-based planning or classical control for reaching, a hybrid Vision-Language-Action model guided by learning-from-observation for grasping, and RL with force feedback for in-hand rotation. Real-robot experiments show the benefits of augmenting real demonstrations with simulated data, and demonstrate end-to-end feasibility with partial success in the final rotation steps. The approach provides practical guidelines, including a vision-based teleoperation system and sim-to-real data augmentation, contributing a modular and biologically informed framework for dexterous manipulation. The results highlight the importance of modality-aware task decomposition and domain-randomized simulation in achieving robust performance on anthropomorphic robotic hands.
Abstract
Multi-step dexterous manipulation is a fundamental skill in household scenarios, yet remains an underexplored area in robotics. This paper proposes a modular approach, where each step of the manipulation process is addressed with dedicated policies based on effective modality input, rather than relying on a single end-to-end model. To demonstrate this, a dexterous robotic hand performs a manipulation task involving picking up and rotating a box. Guided by insights from neuroscience, the task is decomposed into three sub-skills, 1)reaching, 2)grasping and lifting, and 3)in-hand rotation, based on the dominant sensory modalities employed in the human brain. Each sub-skill is addressed using distinct methods from a practical perspective: a classical controller, a Vision-Language-Action model, and a reinforcement learning policy with force feedback, respectively. We tested the pipeline on a real robot to demonstrate the feasibility of our approach. The key contribution of this study lies in presenting a neuroscience-inspired, modality-driven methodology for multi-step dexterous manipulation.
