Table of Contents
Fetching ...

CONTACT: CONtact-aware TACTile Learning for Robotic Disassembly

Yosuke Saka, Jyun-Chi Hu, Adeesh Desai, Zhiyuan Zhang, Bihao Zhang, Quan Khanh Luu, Md Rakibul Islam Prince, Minghui Zheng, Yu She

TL;DR

The results show that tactile sensing plays a critical, task-dependent role in robotic disassembly, with structured force-field representations being particularly effective in contact-dominated scenarios.

Abstract

Robotic disassembly involves contact-rich interactions in which successful manipulation depends not only on geometric alignment but also on force-dependent state transitions. While vision-based policies perform well in structured settings, their reliability often degrades in tight-tolerance, contact-dominated, or deformable scenarios. In this work, we systematically investigate the role of tactile sensing in robotic disassembly through both simulation and real-world experiments. We construct five rigid-body disassembly tasks in simulation with increasing geometric constraints and extraction difficulty. We further design five real-world tasks, including three rigid and two deformable scenarios, to evaluate contact-dependent manipulation. Within a unified learning framework, we compare three sensing configurations: Vision Only, Vision + tactile RGB (TacRGB), and Vision + tactile force field (TacFF). Across both simulation and real-world experiments, TacFF-based policies consistently achieve the highest success rates, with particularly notable gains in contact-dependent and deformable settings. Notably, naive fusion of TacRGB and TacFF underperforms either modality alone, indicating that simple concatenation can dilute task-relevant force information. Our results show that tactile sensing plays a critical, task-dependent role in robotic disassembly, with structured force-field representations being particularly effective in contact-dominated scenarios.

CONTACT: CONtact-aware TACTile Learning for Robotic Disassembly

TL;DR

The results show that tactile sensing plays a critical, task-dependent role in robotic disassembly, with structured force-field representations being particularly effective in contact-dominated scenarios.

Abstract

Robotic disassembly involves contact-rich interactions in which successful manipulation depends not only on geometric alignment but also on force-dependent state transitions. While vision-based policies perform well in structured settings, their reliability often degrades in tight-tolerance, contact-dominated, or deformable scenarios. In this work, we systematically investigate the role of tactile sensing in robotic disassembly through both simulation and real-world experiments. We construct five rigid-body disassembly tasks in simulation with increasing geometric constraints and extraction difficulty. We further design five real-world tasks, including three rigid and two deformable scenarios, to evaluate contact-dependent manipulation. Within a unified learning framework, we compare three sensing configurations: Vision Only, Vision + tactile RGB (TacRGB), and Vision + tactile force field (TacFF). Across both simulation and real-world experiments, TacFF-based policies consistently achieve the highest success rates, with particularly notable gains in contact-dependent and deformable settings. Notably, naive fusion of TacRGB and TacFF underperforms either modality alone, indicating that simple concatenation can dilute task-relevant force information. Our results show that tactile sensing plays a critical, task-dependent role in robotic disassembly, with structured force-field representations being particularly effective in contact-dominated scenarios.
Paper Structure (21 sections, 2 equations, 6 figures, 5 tables)

This paper contains 21 sections, 2 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Structured task design and multimodal evaluation for robotic disassembly. Representative tasks span increasing contact complexity. We compare three sensing configurations: Vision Only, Vision + tactile RGB images (TacRGB), and Vision + tactile force-field representations (TacFF). Across both simulation and real-world experiments, TacFF achieves the highest overall success rates, highlighting the importance of structured force encoding in contact-rich and deformable disassembly.
  • Figure 2: Object geometries and interaction primitives of simulation and real-world disassembly tasks. Each task consists of an initial (Original) and goal (Disassembled) configuration. S1/R1 and S2/R2 share identical shapes. S3/R3 are similar but exhibit minor geometric differences (S3 is larger than R3). S4–S5 (simulation) and R4–R5 (real-world) represent distinct tasks without direct correspondence. The tasks involve diverse interaction modes including pulling, sliding, and pinching disengagement.
  • Figure 3: Comparison of multimodal observations over time in simulation (left) and real-world (right) disassembly. Columns denote task stages, and rows correspond to the front view, wrist view, TacFF, and TacRGB. TacFF visualizes distributed shear (arrow length) and normal force (color, green to red), highlighting contact-state evolution during manipulation. All modalities are recorded at 10 Hz.
  • Figure 4: Visual ambiguity versus tactile disambiguation during grasp in Task R5. Despite similar RGB appearance due to low contrast and self-occlusion, TacRGB and TacFF clearly reveal grasp quality: corner or tilted contacts produce localized indentation in TacRGB and concentrated shear patterns in TacFF.
  • Figure 5: Overview of the visuotactile policy learning pipeline. The policy conditions on a two-step observation history consisting of front- and wrist-view RGB images, the end-effector pose, and optional tactile inputs (TacRGB or TacFF; dashed). Encoded features are concatenated at node C (feature concatenation) and fed into a 1D U-Net–based diffusion model, which iteratively denoises action sequences to produce clean action chunks for execution.
  • ...and 1 more figures