Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing
Ying Yuan, Haichuan Che, Yuzhe Qin, Binghao Huang, Zhao-Heng Yin, Kang-Won Lee, Yi Wu, Soo-Chul Lim, Xiaolong Wang
TL;DR
This work tackles the challenge of fusing vision and touch for in-hand manipulation by introducing Robot Synesthesia, a visuotactile framework that represents tactile data as a 3D point cloud aligned with camera observations. A teacher-student pipeline trains a policy in simulation using low-dimensional states and distills it into a visuotactile policy that processes unified 3D observations via a PointNet encoder, enabling robust in-hand rotation tasks. The approach achieves successful sim-to-real transfer without real-world data, solves complex tasks such as double-ball rotation, and generalizes to novel objects, with ablations showing clear benefits of integrating tactile and visual information. The work demonstrates that a unified 3D tactile-visual representation reduces domain gaps and enhances dexterous manipulation in real-world settings, offering practical implications for tactile-aware robotic manipulation.
Abstract
Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. The method, trained in a simulated environment and then deployed to a real robot, is applicable to various in-hand object rotation tasks. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance. Our project page is available at https://yingyuan0414.github.io/visuotactile/ .
