Table of Contents
Fetching ...

Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing

Ying Yuan, Haichuan Che, Yuzhe Qin, Binghao Huang, Zhao-Heng Yin, Kang-Won Lee, Yi Wu, Soo-Chul Lim, Xiaolong Wang

TL;DR

This work tackles the challenge of fusing vision and touch for in-hand manipulation by introducing Robot Synesthesia, a visuotactile framework that represents tactile data as a 3D point cloud aligned with camera observations. A teacher-student pipeline trains a policy in simulation using low-dimensional states and distills it into a visuotactile policy that processes unified 3D observations via a PointNet encoder, enabling robust in-hand rotation tasks. The approach achieves successful sim-to-real transfer without real-world data, solves complex tasks such as double-ball rotation, and generalizes to novel objects, with ablations showing clear benefits of integrating tactile and visual information. The work demonstrates that a unified 3D tactile-visual representation reduces domain gaps and enhances dexterous manipulation in real-world settings, offering practical implications for tactile-aware robotic manipulation.

Abstract

Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. The method, trained in a simulated environment and then deployed to a real robot, is applicable to various in-hand object rotation tasks. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance. Our project page is available at https://yingyuan0414.github.io/visuotactile/ .

Robot Synesthesia: In-Hand Manipulation with Visuotactile Sensing

TL;DR

This work tackles the challenge of fusing vision and touch for in-hand manipulation by introducing Robot Synesthesia, a visuotactile framework that represents tactile data as a 3D point cloud aligned with camera observations. A teacher-student pipeline trains a policy in simulation using low-dimensional states and distills it into a visuotactile policy that processes unified 3D observations via a PointNet encoder, enabling robust in-hand rotation tasks. The approach achieves successful sim-to-real transfer without real-world data, solves complex tasks such as double-ball rotation, and generalizes to novel objects, with ablations showing clear benefits of integrating tactile and visual information. The work demonstrates that a unified 3D tactile-visual representation reduces domain gaps and enhances dexterous manipulation in real-world settings, offering practical implications for tactile-aware robotic manipulation.

Abstract

Executing contact-rich manipulation tasks necessitates the fusion of tactile and visual feedback. However, the distinct nature of these modalities poses significant challenges. In this paper, we introduce a system that leverages visual and tactile sensory inputs to enable dexterous in-hand manipulation. Specifically, we propose Robot Synesthesia, a novel point cloud-based tactile representation inspired by human tactile-visual synesthesia. This approach allows for the simultaneous and seamless integration of both sensory inputs, offering richer spatial information and facilitating better reasoning about robot actions. The method, trained in a simulated environment and then deployed to a real robot, is applicable to various in-hand object rotation tasks. Comprehensive ablations are performed on how the integration of vision and touch can improve reinforcement learning and Sim2Real performance. Our project page is available at https://yingyuan0414.github.io/visuotactile/ .
Paper Structure (24 sections, 1 equation, 7 figures, 3 tables)

This paper contains 24 sections, 1 equation, 7 figures, 3 tables.

Figures (7)

  • Figure 1: We propose Robot Synesthesia, a novel visuotactile approach to perform in-hand object rotation with visual and tactile modalities. We train our policy in simulation on rotating single or multiple objects around a certain axis and then transfer it to the real robot hand without any real-world data.
  • Figure 2: Real-World Setup. We use an Allegro Hand attached with 16 Force-Sensing Resistors. A Microsoft Azure Kinect camera is placed facing forward the robot.
  • Figure 3: Training Pipeline. Our teacher policy takes robot proprioception, binary contact, object pose, and object shape embedding as input. After training the teacher policy via RL, we distill it to a visuotactile-based student policy. Besides robot proprioception and touch signal, the student policy takes a point cloud from depth-camera, an augmented point cloud based on robot proprioception, and the proposed tactile point cloud. We use one-hot vectors to distinguish point clouds. Note that we've eliminated noise from the point clouds for better clarity here.
  • Figure 4: Point Cloud Visualization in Sim and Real. The Sim-to-Real gap is notably larger for RGB images compared to point clouds, leading us to select point clouds as the visual observation for our policy.
  • Figure 5: Object Sets in Sim and Real. We use artificial objects for training and daily objects for testing.
  • ...and 2 more figures