FreeTacMan: Robot-free Visuo-Tactile Data Collection System for Contact-rich Manipulation
Longyan Wu, Checheng Yu, Jieji Ren, Li Chen, Yufei Jiang, Ran Huang, Guoying Gu, Hongyang Li
TL;DR
FreeTacMan introduces a robot-free, human-centric visuo-tactile data collection system that provides real-time tactile feedback via a fingertip-mounted sensor array and precise end-effector pose tracking, enabling large-scale, high-fidelity manipulation datasets. It combines a modular hardware design with a CLIP-style tactile pretraining regime and an ACT-based policy learner to fuse visual and tactile information for 7-DoF control. The approach yields a substantial dataset (over 3000k visuo-tactile image pairs, 10k trajectories across 50 tasks) and demonstrates that tactile data substantially improves imitation learning performance, including robust generalization to unseen objects and cross-sensor setups. The work demonstrates significant gains in data collection efficiency and policy success rates, supporting rapid development of visuo-tactile manipulation policies and paving the way for extensible, real-world tactile learning research; the dataset and hardware will be released to facilitate reproducibility.
Abstract
Enabling robots with contact-rich manipulation remains a pivotal challenge in robot learning, which is substantially hindered by the data collection gap, including its inefficiency and limited sensor setup. While prior work has explored handheld paradigms, their rod-based mechanical structures remain rigid and unintuitive, providing limited tactile feedback and posing challenges for human operators. Motivated by the dexterity and force feedback of human motion, we propose FreeTacMan, a human-centric and robot-free data collection system for accurate and efficient robot manipulation. Concretely, we design a wearable gripper with dual visuo-tactile sensors for data collection, which can be worn by human fingers for intuitive control. A high-precision optical tracking system is introduced to capture end-effector poses while synchronizing visual and tactile feedback simultaneously. We leverage FreeTacMan to collect a large-scale multimodal dataset, comprising over 3000k paired visual-tactile images with end-effector poses, 10k demonstration trajectories across 50 diverse contact-rich manipulation tasks. FreeTacMan achieves multiple improvements in data collection performance compared to prior works, and enables effective policy learning for contact-rich manipulation tasks with self-collected dataset. The full suite of hardware specifications and the dataset will be released to facilitate reproducibility and support research in visuo-tactile manipulation.
