MILE: A Mechanically Isomorphic Exoskeleton Data Collection System with Fingertip Visuotactile Sensing for Dexterous Manipulation
Jinda Du, Jieji Ren, Qiaojun Yu, Ningbin Zhang, Yu Deng, Xingyu Wei, Yufei Liu, Guoying Gu, Xiangyang Zhu
TL;DR
<3-5 sentence high-level summary> This paper introduces MILE, a mechanically isomorphic exoskeleton–robot hand system with fingertip visuotactile sensing designed to collect high-fidelity demonstrations for dexterous manipulation. By ensuring one-to-one joint correspondence and embedding compact Tac-Tip sensors, the system eliminates retargeting distortions and achieves sub-degree joint sensing, enabling precise teleoperation and rich multimodal data capture. The authors demonstrate substantial gains in teleoperation success and improved robustness of imitation-learning policies when tactile data are included, validated on several contact-rich tasks. Overall, MILE provides a scalable data-collection pipeline that advances learning-based dexterous manipulation through high-quality vision–tactile multimodal demonstrations.
Abstract
Imitation learning provides a promising approach to dexterous hand manipulation, but its effectiveness is limited by the lack of large-scale, high-fidelity data. Existing data-collection pipelines suffer from inaccurate motion retargeting, low data-collection efficiency, and missing high-resolution fingertip tactile sensing. We address this gap with MILE, a mechanically isomorphic teleoperation and data-collection system co-designed from human hand to exoskeleton to robotic hand. The exoskeleton is anthropometrically derived from the human hand, and the robotic hand preserves one-to-one joint-position isomorphism, eliminating nonlinear retargeting and enabling precise, natural control. The exoskeleton achieves a multi-joint mean absolute angular error below one degree, while the robotic hand integrates compact fingertip visuotactile modules that provide high-resolution tactile observations. Built on this retargeting-free interface, we teleoperate complex, contact-rich in-hand manipulation and efficiently collect a multimodal dataset comprising high-resolution fingertip visuotactile signals, RGB-D images, and joint positions. The teleoperation pipeline achieves a mean success rate improvement of 64%. Incorporating fingertip tactile observations further increases the success rate by an average of 25% over the vision-only baseline, validating the fidelity and utility of the dataset. Further details are available at: https://sites.google.com/view/mile-system.
