Sampling-Based Model Predictive Control for Dexterous Manipulation on a Biomimetic Tendon-Driven Hand
Adrian Hess, Alexander M. Kübler, Benedek Forrai, Mehmet Dogar, Robert K. Katzschmann
TL;DR
Dexterous in-hand manipulation with biomimetic tendon-driven hands is hard due to high dimensionality and uncertain state. The authors combine sampling-based MPC (MuJoCo) with a visual-language model (GPT-4o) to autonomously adapt task-specific objective weights from video feedback, enabling rapid, retraining-free control. They demonstrate ball rolling, flipping, and catching in simulation and on physical hardware, including scenarios with a robotic arm, and show that a few adaptation cycles suffice to achieve functional dexterity. The work bridges simulation and real-world deployment, offering a flexible framework for rapid development of dexterous manipulation skills on compliant hands.
Abstract
Biomimetic and compliant robotic hands offer the potential for human-like dexterity, but controlling them is challenging due to high dimensionality, complex contact interactions, and uncertainties in state estimation. Sampling-based model predictive control (MPC), using a physics simulator as the dynamics model, is a promising approach for generating contact-rich behavior. However, sampling-based MPC has yet to be evaluated on physical (non-simulated) robotic hands, particularly on compliant hands with state uncertainties. We present the first successful demonstration of in-hand manipulation on a physical biomimetic tendon-driven robot hand using sampling-based MPC. While sampling-based MPC does not require lengthy training cycles like reinforcement learning approaches, it still necessitates adapting the task-specific objective function to ensure robust behavior execution on physical hardware. To adapt the objective function, we integrate a visual language model (VLM) with a real-time optimizer (MuJoCo MPC). We provide the VLM with a high-level human language description of the task and a video of the hand's current behavior. The VLM gradually adapts the objective function, allowing for efficient behavior generation, with each iteration taking less than two minutes. We show the feasibility of ball rolling, flipping, and catching using both simulated and physical robot hands. Our results demonstrate that sampling-based MPC is a promising approach for generating dexterous manipulation skills on biomimetic hands without extensive training cycles.
