Table of Contents
Fetching ...

Master Micro Residual Correction with Adaptive Tactile Fusion and Force-Mixed Control for Contact-Rich Manipulation

Xingting Li, Yifan Xie, Han Liu, Wei Hou, Guangyu Chen, Shoujie Li, Wenbo Ding

Abstract

Robotic contact-rich and fine-grained manipulation remains a significant challenge due to complex interaction dynamics and the competing requirements of multi-timescale control. While current visual imitation learning methods excel at long-horizon planning, they often fail to perceive critical interaction cues like friction variations or incipient slip, and struggle to balance global task coherence with local reactive feedback. To address these challenges, we propose M2-ResiPolicy, a novel Master-Micro residual control architecture that synergizes high-level action guidance with low-level correction. The framework consists of a Master-Guidance Policy (MGP) operating at 10 Hz, which generates temporally consistent action chunks via a diffusion-based backbone and employs a tactile-intensity-driven adaptive fusion mechanism to dynamically modulate perceptual weights between vision and touch. Simultaneously, a high-frequency (60 Hz) Micro-Residual Corrector (MRC) utilizes a lightweight GRU to provide real-time action compensation based on TCP wrench feedback. This policy is further integrated with a force-mixed PBIC execution layer, effectively regulating contact forces to ensure interaction safety. Experiments across several demanding tasks including fragile object grasping and precision insertion, demonstrate that M2-ResiPolicy significantly outperforms standard Diffusion Policy (DP) and state-of-the-art Reactive Diffusion Policy (RDP), achieving a 93\% damage-free success rate in chip grasping and superior force regulation stability.

Master Micro Residual Correction with Adaptive Tactile Fusion and Force-Mixed Control for Contact-Rich Manipulation

Abstract

Robotic contact-rich and fine-grained manipulation remains a significant challenge due to complex interaction dynamics and the competing requirements of multi-timescale control. While current visual imitation learning methods excel at long-horizon planning, they often fail to perceive critical interaction cues like friction variations or incipient slip, and struggle to balance global task coherence with local reactive feedback. To address these challenges, we propose M2-ResiPolicy, a novel Master-Micro residual control architecture that synergizes high-level action guidance with low-level correction. The framework consists of a Master-Guidance Policy (MGP) operating at 10 Hz, which generates temporally consistent action chunks via a diffusion-based backbone and employs a tactile-intensity-driven adaptive fusion mechanism to dynamically modulate perceptual weights between vision and touch. Simultaneously, a high-frequency (60 Hz) Micro-Residual Corrector (MRC) utilizes a lightweight GRU to provide real-time action compensation based on TCP wrench feedback. This policy is further integrated with a force-mixed PBIC execution layer, effectively regulating contact forces to ensure interaction safety. Experiments across several demanding tasks including fragile object grasping and precision insertion, demonstrate that M2-ResiPolicy significantly outperforms standard Diffusion Policy (DP) and state-of-the-art Reactive Diffusion Policy (RDP), achieving a 93\% damage-free success rate in chip grasping and superior force regulation stability.
Paper Structure (24 sections, 3 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 24 sections, 3 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: M2-ResiPolicy Framework. Our dual-stream architecture decouples global planning (MGP) from high-frequency local correction (MRC), unified by a force-mixed PBIC execution layer.
  • Figure 2: Overview of the proposed M2-ResiPolicy framework.Left: The 10 Hz Master-Guidance Policy encodes multimodal observations and generates temporally coherent action chunks via confidence-gated cross-attention fusion, while the 60 Hz Micro-Residual Corrector predicts residual compensation from aligned TCP wrench feedback. Right: The reference pose $x_{\mathrm{ref}}$ is executed through the force-mixed PBIC layer at 125 Hz to produce a compliant pose command $x_{\mathrm{cmd}}$ for stable contact interaction.
  • Figure 3: System setup and data streams. A teleoperated UR7e platform collects visuotactile, RGB, pose, and TCP wrench data. Visuotactile signals, RGB, and end-effector pose are sampled at 10 Hz, while TCP wrench is sampled at 60 Hz.
  • Figure 4: Evaluation tasks. Key stages of four benchmarks. From top to bottom: Chip Transfer, Plug Insertion, Whiteboard Wiping, and Block Assembly. Each row illustrates a typical execution sequence, with fingertip visuotactile observations visualized in the corner to reflect local deformations and interaction-state changes during contact.
  • Figure 5: Baseline Architectures. We compare our hierarchical MGP+MRC framework against the open-loop Diffusion Policy (DP) and the end-to-end Reactive Diffusion Policy (RDP).
  • ...and 2 more figures