Table of Contents
Fetching ...

FreeTacMan: Robot-free Visuo-Tactile Data Collection System for Contact-rich Manipulation

Longyan Wu, Checheng Yu, Jieji Ren, Li Chen, Yufei Jiang, Ran Huang, Guoying Gu, Hongyang Li

TL;DR

FreeTacMan introduces a robot-free, human-centric visuo-tactile data collection system that provides real-time tactile feedback via a fingertip-mounted sensor array and precise end-effector pose tracking, enabling large-scale, high-fidelity manipulation datasets. It combines a modular hardware design with a CLIP-style tactile pretraining regime and an ACT-based policy learner to fuse visual and tactile information for 7-DoF control. The approach yields a substantial dataset (over 3000k visuo-tactile image pairs, 10k trajectories across 50 tasks) and demonstrates that tactile data substantially improves imitation learning performance, including robust generalization to unseen objects and cross-sensor setups. The work demonstrates significant gains in data collection efficiency and policy success rates, supporting rapid development of visuo-tactile manipulation policies and paving the way for extensible, real-world tactile learning research; the dataset and hardware will be released to facilitate reproducibility.

Abstract

Enabling robots with contact-rich manipulation remains a pivotal challenge in robot learning, which is substantially hindered by the data collection gap, including its inefficiency and limited sensor setup. While prior work has explored handheld paradigms, their rod-based mechanical structures remain rigid and unintuitive, providing limited tactile feedback and posing challenges for human operators. Motivated by the dexterity and force feedback of human motion, we propose FreeTacMan, a human-centric and robot-free data collection system for accurate and efficient robot manipulation. Concretely, we design a wearable gripper with dual visuo-tactile sensors for data collection, which can be worn by human fingers for intuitive control. A high-precision optical tracking system is introduced to capture end-effector poses while synchronizing visual and tactile feedback simultaneously. We leverage FreeTacMan to collect a large-scale multimodal dataset, comprising over 3000k paired visual-tactile images with end-effector poses, 10k demonstration trajectories across 50 diverse contact-rich manipulation tasks. FreeTacMan achieves multiple improvements in data collection performance compared to prior works, and enables effective policy learning for contact-rich manipulation tasks with self-collected dataset. The full suite of hardware specifications and the dataset will be released to facilitate reproducibility and support research in visuo-tactile manipulation.

FreeTacMan: Robot-free Visuo-Tactile Data Collection System for Contact-rich Manipulation

TL;DR

FreeTacMan introduces a robot-free, human-centric visuo-tactile data collection system that provides real-time tactile feedback via a fingertip-mounted sensor array and precise end-effector pose tracking, enabling large-scale, high-fidelity manipulation datasets. It combines a modular hardware design with a CLIP-style tactile pretraining regime and an ACT-based policy learner to fuse visual and tactile information for 7-DoF control. The approach yields a substantial dataset (over 3000k visuo-tactile image pairs, 10k trajectories across 50 tasks) and demonstrates that tactile data substantially improves imitation learning performance, including robust generalization to unseen objects and cross-sensor setups. The work demonstrates significant gains in data collection efficiency and policy success rates, supporting rapid development of visuo-tactile manipulation policies and paving the way for extensible, real-world tactile learning research; the dataset and hardware will be released to facilitate reproducibility.

Abstract

Enabling robots with contact-rich manipulation remains a pivotal challenge in robot learning, which is substantially hindered by the data collection gap, including its inefficiency and limited sensor setup. While prior work has explored handheld paradigms, their rod-based mechanical structures remain rigid and unintuitive, providing limited tactile feedback and posing challenges for human operators. Motivated by the dexterity and force feedback of human motion, we propose FreeTacMan, a human-centric and robot-free data collection system for accurate and efficient robot manipulation. Concretely, we design a wearable gripper with dual visuo-tactile sensors for data collection, which can be worn by human fingers for intuitive control. A high-precision optical tracking system is introduced to capture end-effector poses while synchronizing visual and tactile feedback simultaneously. We leverage FreeTacMan to collect a large-scale multimodal dataset, comprising over 3000k paired visual-tactile images with end-effector poses, 10k demonstration trajectories across 50 diverse contact-rich manipulation tasks. FreeTacMan achieves multiple improvements in data collection performance compared to prior works, and enables effective policy learning for contact-rich manipulation tasks with self-collected dataset. The full suite of hardware specifications and the dataset will be released to facilitate reproducibility and support research in visuo-tactile manipulation.

Paper Structure

This paper contains 20 sections, 1 equation, 13 figures, 7 tables.

Figures (13)

  • Figure 1: Overview of FreeTacMan. FreeTacMan is a robot-free, human-centric visuo-tactile data collection system that enables the efficient transfer of human visual, tactile, and motor skills to robots. It facilitates the collection of large-scale, contact-rich manipulation datasets. Demos can be found at https://opendrivelab.com/FreeTacMan.
  • Figure 2: Hardware system. Left: The in-situ gripper in the collection and execution interface respectively, with identical visual and tactile observations. Right: (a) Composition of the sensor. (b) Exploded view of FreeTacMan. (c) The modular design allows for an agile switch between the collection and execution interface. (d) Human-machine interface design.
  • Figure 3: Tactile pretraining and policy learning pipeline. (a) A tactile encoder is pretrained using the self-collected dataset. (b) The pretrained tactile encoder is integrated into an ACT-based policy for downstream tasks such as USB insertion.
  • Figure 4: The FreeTacMan dataset. (a) Representative examples illustrating the diversity in both task complexity and tactile context. (b) The dataset covers 50 tasks and features a large-scale collection of data, including more than 10k trajectories which contains over 3000k visuo-tactile pairs. (c) The dataset enables diverse fundamental tactile capabilities.
  • Figure 5: Human demonstrations and policy rollouts. The top row shows goal trajectories, the middle row demonstrates successful rollouts with tactile feedback, and the bottom row showcases typical failure modes when without tactile input.
  • ...and 8 more figures