WHED: A Wearable Hand Exoskeleton for Natural, High-Quality Demonstration Collection
Mingzhang Zhu, Alvin Zhu, Jose Victor S. H. Ramos, Beom Jun Kim, Yike Shi, Yufeng Wu, Ruochen Hou, Quanyou Wang, Eric Song, Tony Fan, Yuchen Cui, Dennis W. Hong
TL;DR
WHED addresses the bottleneck in scalable, high-fidelity dexterous demonstrations by combining a wearability-first hand exoskeleton with a free-to-move thumb and a passive data-capture hand. It delivers an end-to-end pipeline that synchronizes finger encoders, AR-based end-effector pose, and wrist-mounted vision, plus post-processing for time alignment and replay. The approach enables demonstrations that span precision pinch to full-hand enclosure and shows qualitative replay fidelity, suggesting a practical path toward scalable dexterous manipulation learning in-the-wild. The work combines mechanical design with an integrated data pipeline to lower barriers to large-scale, realistic demonstrations for high-DoF robotic hands.
Abstract
Scalable learning of dexterous manipulation remains bottlenecked by the difficulty of collecting natural, high-fidelity human demonstrations of multi-finger hands due to occlusion, complex hand kinematics, and contact-rich interactions. We present WHED, a wearable hand-exoskeleton system designed for in-the-wild demonstration capture, guided by two principles: wearability-first operation for extended use and a pose-tolerant, free-to-move thumb coupling that preserves natural thumb behaviors while maintaining a consistent mapping to the target robot thumb degrees of freedom. WHED integrates a linkage-driven finger interface with passive fit accommodation, a modified passive hand with robust proprioceptive sensing, and an onboard sensing/power module. We also provide an end-to-end data pipeline that synchronizes joint encoders, AR-based end-effector pose, and wrist-mounted visual observations, and supports post-processing for time alignment and replay. We demonstrate feasibility on representative grasping and manipulation sequences spanning precision pinch and full-hand enclosure grasps, and show qualitative consistency between collected demonstrations and replayed executions.
