Learning Dexterous In-Hand Manipulation with Multifingered Hands via Visuomotor Diffusion
Piotr Koczy, Michael C. Welle, Danica Kragic
TL;DR
The paper tackles dexterous in-hand manipulation with multifingered hands by extending visuomotor diffusion policies to autonomous, one-hand unscrewing tasks. It introduces an AR-based teleoperation pipeline to collect high-quality demonstrations and a demonstration-filtering method using HDBSCAN and GLOSH to improve data reliability. Through comprehensive ablations, it shows that wrist-camera observations combined with joint positions and effort provide the strongest policy performance, achieving an 85% real-world success rate on unscrewing a bottle lid. The work demonstrates the feasibility of deploying visuomotor diffusion policies on mobile platforms and underscores the value of targeted demonstration filtering for robust dexterous control.
Abstract
We present a framework for learning dexterous in-hand manipulation with multifingered hands using visuomotor diffusion policies. Our system enables complex in-hand manipulation tasks, such as unscrewing a bottle lid with one hand, by leveraging a fast and responsive teleoperation setup for the four-fingered Allegro Hand. We collect high-quality expert demonstrations using an augmented reality (AR) interface that tracks hand movements and applies inverse kinematics and motion retargeting for precise control. The AR headset provides real-time visualization, while gesture controls streamline teleoperation. To enhance policy learning, we introduce a novel demonstration outlier removal approach based on HDBSCAN clustering and the Global-Local Outlier Score from Hierarchies (GLOSH) algorithm, effectively filtering out low-quality demonstrations that could degrade performance. We evaluate our approach extensively in real-world settings and provide all experimental videos on the project website: https://dex-manip.github.io/
