Table of Contents
Fetching ...

Residual Rotation Correction using Tactile Equivariance

Yizhe Zhu, Zhang Ye, Boce Hu, Haibo Zhao, Yu Qi, Dian Wang, Robert Platt

TL;DR

This work tackles the data-efficiency challenge in visuotactile robotic manipulation by exploiting the planar rotational symmetry of in-hand object orientation. It introduces EquiTac, which reconstructs surface normal maps from tactile images and employs an $\mathrm{SO}(2)$-equivariant network to predict a yaw target, providing a real-time angular residual that corrects a base visuomotor policy without extra demonstrations. The approach combines a flow-matching base policy with a lightweight, symmetry-aware tactile residual, achieving strong zero-shot generalization to unseen orientations and superior robustness under perturbations in real-world robot experiments. By delivering a lightweight, symmetry-aware module that explicitly encodes tactile equivariance, EquiTac improves reliability in contact-rich manipulation while substantially reducing data and online training requirements.

Abstract

Visuotactile policy learning augments vision-only policies with tactile input, facilitating contact-rich manipulation. However, the high cost of tactile data collection makes sample efficiency the key requirement for developing visuotactile policies. We present EquiTac, a framework that exploits the inherent SO(2) symmetry of in-hand object rotation to improve sample efficiency and generalization for visuotactile policy learning. EquiTac first reconstructs surface normals from raw RGB inputs of vision-based tactile sensors, so rotations of the normal vector field correspond to in-hand object rotations. An SO(2)-equivariant network then predicts a residual rotation action that augments a base visuomotor policy at test time, enabling real-time rotation correction without additional reorientation demonstrations. On a real robot, EquiTac accurately achieves robust zero-shot generalization to unseen in-hand orientations with very few training samples, where baselines fail even with more training data. To our knowledge, this is the first tactile learning method to explicitly encode tactile equivariance for policy learning, yielding a lightweight, symmetry-aware module that improves reliability in contact-rich tasks.

Residual Rotation Correction using Tactile Equivariance

TL;DR

This work tackles the data-efficiency challenge in visuotactile robotic manipulation by exploiting the planar rotational symmetry of in-hand object orientation. It introduces EquiTac, which reconstructs surface normal maps from tactile images and employs an -equivariant network to predict a yaw target, providing a real-time angular residual that corrects a base visuomotor policy without extra demonstrations. The approach combines a flow-matching base policy with a lightweight, symmetry-aware tactile residual, achieving strong zero-shot generalization to unseen orientations and superior robustness under perturbations in real-world robot experiments. By delivering a lightweight, symmetry-aware module that explicitly encodes tactile equivariance, EquiTac improves reliability in contact-rich manipulation while substantially reducing data and online training requirements.

Abstract

Visuotactile policy learning augments vision-only policies with tactile input, facilitating contact-rich manipulation. However, the high cost of tactile data collection makes sample efficiency the key requirement for developing visuotactile policies. We present EquiTac, a framework that exploits the inherent SO(2) symmetry of in-hand object rotation to improve sample efficiency and generalization for visuotactile policy learning. EquiTac first reconstructs surface normals from raw RGB inputs of vision-based tactile sensors, so rotations of the normal vector field correspond to in-hand object rotations. An SO(2)-equivariant network then predicts a residual rotation action that augments a base visuomotor policy at test time, enabling real-time rotation correction without additional reorientation demonstrations. On a real robot, EquiTac accurately achieves robust zero-shot generalization to unseen in-hand orientations with very few training samples, where baselines fail even with more training data. To our knowledge, this is the first tactile learning method to explicitly encode tactile equivariance for policy learning, yielding a lightweight, symmetry-aware module that improves reliability in contact-rich tasks.

Paper Structure

This paper contains 20 sections, 5 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Equivariance in EquiTac. When the tactile observation is rotated, the predicted action rotates consistently.
  • Figure 2: Overview of our tactile-guided manipulation framework with equivariant orientation correction. The system begins with a Flow Matching Policy (top) that predicts basic action chunks from multimodal inputs including robot proprioception, tactile images, and three camera views. During action execution (middle), tactile images are processed through a Normal Reconstructor to obtain normal maps, which are fed into an $\mathrm{SO}(2)$-equivariant network. The equivariant network will predict the angular residual between the object's current and target orientations, enabling real-time correction of the action chunk to correct for misalignment. The bottom row shows (i) the data-collection setup with ideal object placement and (ii) the results of executing the base and corrected trajectories under placement deviations at rollout.
  • Figure 3: Sensor on the gripper fingertip, $z$-axis denotes the finger normal.
  • Figure 4: Equivariance of the normal map. When the object rotates in hand, the normal map co-rotates as a vector field.
  • Figure 5: Qualitative comparison of angular estimation across different model configurations.
  • ...and 4 more figures