Table of Contents
Fetching ...

Touch2Insert: Zero-Shot Peg Insertion by Touching Intersections of Peg and Hole

Masaru Yajima, Yuma Shin, Rei Kawakami, Asako Kanezaki, Kei Ota

TL;DR

Touch2Insert, a tactile-based framework for arbitrary peg insertion, reconstructs cross-sectional geometry from high-resolution tactile images and estimates the relative pose of the hole with respect to the peg in a zero-shot manner, confirming the robustness and generalizability of tactile sensing for real-world robotic connector insertion.

Abstract

Reliable insertion of industrial connectors remains a central challenge in robotics, requiring sub-millimeter precision under uncertainty and often without full visual access. Vision-based approaches struggle with occlusion and limited generalization, while learning-based policies frequently fail to transfer to unseen geometries. To address these limitations, we leverage tactile sensing, which captures local surface geometry at the point of contact and thus provides reliable information even under occlusion and across novel connector shapes. Building on this capability, we present \emph{Touch2Insert}, a tactile-based framework for arbitrary peg insertion. Our method reconstructs cross-sectional geometry from high-resolution tactile images and estimates the relative pose of the hole with respect to the peg in a zero-shot manner. By aligning reconstructed shapes through registration, the framework enables insertion from a single contact without task-specific training. To evaluate its performance, we conducted experiments with three diverse connectors in both simulation and real-robot settings. The results indicate that Touch2Insert achieved sub-millimeter pose estimation accuracy for all connectors in simulation, and attained an average success rate of 86.7\% on the real robot, thereby confirming the robustness and generalizability of tactile sensing for real-world robotic connector insertion.

Touch2Insert: Zero-Shot Peg Insertion by Touching Intersections of Peg and Hole

TL;DR

Touch2Insert, a tactile-based framework for arbitrary peg insertion, reconstructs cross-sectional geometry from high-resolution tactile images and estimates the relative pose of the hole with respect to the peg in a zero-shot manner, confirming the robustness and generalizability of tactile sensing for real-world robotic connector insertion.

Abstract

Reliable insertion of industrial connectors remains a central challenge in robotics, requiring sub-millimeter precision under uncertainty and often without full visual access. Vision-based approaches struggle with occlusion and limited generalization, while learning-based policies frequently fail to transfer to unseen geometries. To address these limitations, we leverage tactile sensing, which captures local surface geometry at the point of contact and thus provides reliable information even under occlusion and across novel connector shapes. Building on this capability, we present \emph{Touch2Insert}, a tactile-based framework for arbitrary peg insertion. Our method reconstructs cross-sectional geometry from high-resolution tactile images and estimates the relative pose of the hole with respect to the peg in a zero-shot manner. By aligning reconstructed shapes through registration, the framework enables insertion from a single contact without task-specific training. To evaluate its performance, we conducted experiments with three diverse connectors in both simulation and real-robot settings. The results indicate that Touch2Insert achieved sub-millimeter pose estimation accuracy for all connectors in simulation, and attained an average success rate of 86.7\% on the real robot, thereby confirming the robustness and generalizability of tactile sensing for real-world robotic connector insertion.
Paper Structure (13 sections, 10 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 10 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: This paper addresses the problem of inserting an arbitrary peg into an unknown hole without any prior knowledge of their shapes or types. We propose a novel framework in which the robot makes contact with the cross-sections of the peg and hole, and estimates the hole's relative pose with respect to the peg in $\mathrm{SE}(2)$ in a zero-shot manner from 3D point clouds obtained from tactile images.
  • Figure 2: Definition of coordinates. The coordinate frames are defined as follows: $F_{\mathrm{e}e}$ denotes the end-effector frame, $F_{\mathrm{ee}}$ the frame of the grasped peg, and $F_{\mathrm{h}}$ the hole frame. The objective is to estimate the relative pose of the hole with respect to the peg, $^{\mathrm{p}}_{\mathrm{h}}\hat{T}$, which inevitably contains estimation noise. This transformation is then incorporated into the coordinate conversion described in Eq. \ref{['transformation']}, yielding the estimated end-effector pose in the world frame. Finally, this pose is issued to the robot as a command for execution.
  • Figure 3: Touch2Insert: Overview of the proposed peg insertion framework. Tactile images are first converted into gradient maps and integrated to reconstruct 3D cross-sectional shapes of the peg and hole. The resulting point clouds are then refined by inverting the hole geometry, applying height-based filtering, projecting onto a 2D plane, and removing background artifacts. The cleaned planar point clouds are aligned using ICP with multiple initializations to estimate the relative $\mathrm{SE}(2)$ pose between peg and hole. Finally, the robot performs insertion under stiffness control, compensating for small residual errors without requiring exploratory search.
  • Figure 4: Connectors and example tactile images of the corresponding pegs and holes used in our experiments.
  • Figure 5: Successful and failed examples of OmniGlue baseline that uses edge detection and feature matching for the Lightning connector. For each image, the left shows the edge image of a peg, and the right shows a hole. In the successful case (a), matching occurs roughly at the corresponding positions of the peg and the hole. In contrast, in the failed case (b), there are regions where matching occurs at non-corresponding positions, leading to an inaccurate estimation.
  • ...and 2 more figures