OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction

Yuxin Ray Song; Jinzhou Li; Rao Fu; Devin Murphy; Kaichen Zhou; Rishi Shiv; Yaqi Li; Haoyu Xiong; Crystal Elaine Owens; Yilun Du; Yiyue Luo; Xianyi Cheng; Antonio Torralba; Wojciech Matusik; Paul Pu Liang

OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction

Yuxin Ray Song, Jinzhou Li, Rao Fu, Devin Murphy, Kaichen Zhou, Rishi Shiv, Yaqi Li, Haoyu Xiong, Crystal Elaine Owens, Yilun Du, Yiyue Luo, Xianyi Cheng, Antonio Torralba, Wojciech Matusik, Paul Pu Liang

TL;DR

OpenTouch introduces the first in-the-wild, egocentric full-hand tactile dataset capturing synchronized vision, contact forces, and hand pose. The authors provide a low-cost, open tactile glove and a comprehensive collection/annotation pipeline across diverse environments, plus benchmarks for cross-sensory retrieval and tactile-based grasp classification. Results show tactile signals are compact yet highly informative for grasp understanding and can improve cross-modal alignment when combined with vision and pose, with temporal context and encoder design significantly impacting performance. This dataset and benchmarks enable scalable research in touch-grounded perception and robotic manipulation, bridging vision and tactile sensing in real-world manipulation scenarios.

Abstract

The human hand is our primary interface to the physical world, yet egocentric perception rarely knows when, where, or how forcefully it makes contact. Robust wearable tactile sensors are scarce, and no existing in-the-wild datasets align first-person video with full-hand touch. To bridge the gap between visual perception and physical interaction, we present OpenTouch, the first in-the-wild egocentric full-hand tactile dataset, containing 5.1 hours of synchronized video-touch-pose data and 2,900 curated clips with detailed text annotations. Using OpenTouch, we introduce retrieval and classification benchmarks that probe how touch grounds perception and action. We show that tactile signals provide a compact yet powerful cue for grasp understanding, strengthen cross-modal alignment, and can be reliably retrieved from in-the-wild video queries. By releasing this annotated vision-touch-pose dataset and benchmark, we aim to advance multimodal egocentric perception, embodied learning, and contact-rich robotic manipulation.

OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction

TL;DR

Abstract

OPENTOUCH: Bringing Full-Hand Touch to Real-World Interaction

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (11)