Table of Contents
Fetching ...

Mind the Gap: Learning Implicit Impedance in Visuomotor Policies via Intent-Execution Mismatch

Cuijie Xu, Shurui Zheng, Zihao Su, Yuanfan Xu, Tinghao Yi, Xudong Zhang, Jian Wang, Yu Wang, Jinchen Yu

TL;DR

This paper tackles the challenge of achieving force-aware manipulation with sensorless, low-cost hardware in teleoperation. By reframing learning from Execution Cloning to Intent Cloning and introducing Dual-State Conditioning based on the Intent-Execution Mismatch, the authors enable implicit impedance and force perception without explicit sensors. They further address inference latency with Latency-Adaptive Inpainting, ensuring continuous, stable control under varying delays. Empirical results across six tasks demonstrate that the proposed approach outperforms traditional execution-cloning, enabling robust contact-rich manipulation and dynamic tracking on hardware with minimal sensing. Collectively, the work advances practical, low-cost teleoperation by integrating impedance-like behavior directly into learned visuomotor policies.

Abstract

Teleoperation inherently relies on the human operator acting as a closed-loop controller to actively compensate for hardware imperfections, including latency, mechanical friction, and lack of explicit force feedback. Standard Behavior Cloning (BC), by mimicking the robot's executed trajectory, fundamentally ignores this compensatory mechanism. In this work, we propose a Dual-State Conditioning framework that shifts the learning objective to "Intent Cloning" (master command). We posit that the Intent-Execution Mismatch, the discrepancy between master command and slave response, is not noise, but a critical signal that physically encodes implicit interaction forces and algorithmically reveals the operator's strategy for overcoming system dynamics. By predicting the master intent, our policy learns to generate a "virtual equilibrium point", effectively realizing implicit impedance control. Furthermore, by explicitly conditioning on the history of this mismatch, the model performs implicit system identification, perceiving tracking errors as external forces to close the control loop. To bridge the temporal gap caused by inference latency, we further formulate the policy as a trajectory inpainter to ensure continuous control. We validate our approach on a sensorless, low-cost bi-manual setup. Empirical results across tasks requiring contact-rich manipulation and dynamic tracking reveal a decisive gap: while standard execution-cloning fails due to the inability to overcome contact stiffness and tracking lag, our mismatch-aware approach achieves robust success. This presents a minimalist behavior cloning framework for low-cost hardware, enabling force perception and dynamic compensation without relying on explicit force sensing. Videos are available on the \href{https://xucj98.github.io/mind-the-gap-page/}{project page}.

Mind the Gap: Learning Implicit Impedance in Visuomotor Policies via Intent-Execution Mismatch

TL;DR

This paper tackles the challenge of achieving force-aware manipulation with sensorless, low-cost hardware in teleoperation. By reframing learning from Execution Cloning to Intent Cloning and introducing Dual-State Conditioning based on the Intent-Execution Mismatch, the authors enable implicit impedance and force perception without explicit sensors. They further address inference latency with Latency-Adaptive Inpainting, ensuring continuous, stable control under varying delays. Empirical results across six tasks demonstrate that the proposed approach outperforms traditional execution-cloning, enabling robust contact-rich manipulation and dynamic tracking on hardware with minimal sensing. Collectively, the work advances practical, low-cost teleoperation by integrating impedance-like behavior directly into learned visuomotor policies.

Abstract

Teleoperation inherently relies on the human operator acting as a closed-loop controller to actively compensate for hardware imperfections, including latency, mechanical friction, and lack of explicit force feedback. Standard Behavior Cloning (BC), by mimicking the robot's executed trajectory, fundamentally ignores this compensatory mechanism. In this work, we propose a Dual-State Conditioning framework that shifts the learning objective to "Intent Cloning" (master command). We posit that the Intent-Execution Mismatch, the discrepancy between master command and slave response, is not noise, but a critical signal that physically encodes implicit interaction forces and algorithmically reveals the operator's strategy for overcoming system dynamics. By predicting the master intent, our policy learns to generate a "virtual equilibrium point", effectively realizing implicit impedance control. Furthermore, by explicitly conditioning on the history of this mismatch, the model performs implicit system identification, perceiving tracking errors as external forces to close the control loop. To bridge the temporal gap caused by inference latency, we further formulate the policy as a trajectory inpainter to ensure continuous control. We validate our approach on a sensorless, low-cost bi-manual setup. Empirical results across tasks requiring contact-rich manipulation and dynamic tracking reveal a decisive gap: while standard execution-cloning fails due to the inability to overcome contact stiffness and tracking lag, our mismatch-aware approach achieves robust success. This presents a minimalist behavior cloning framework for low-cost hardware, enabling force perception and dynamic compensation without relying on explicit force sensing. Videos are available on the \href{https://xucj98.github.io/mind-the-gap-page/}{project page}.
Paper Structure (31 sections, 3 equations, 9 figures, 5 tables)

This paper contains 31 sections, 3 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Implicit Impedance Learning on Sensorless Hardware. Our method enables low-cost, position-controlled robots (left) to master force-sensitive and dynamic tasks without explicit force, tactile, or motor current feedback. By conditioning on the "Intent-Execution Mismatch", the policy successfully performs tasks such as: high-precision plug insertion, force-modulated surface wiping, dynamic tossing, and proprioceptive weight sorting of visually identical objects.
  • Figure 2: In teleoperation stage, the human operator acts as an inverse controller, compensating for hardware limitations (the "black box non-ideal controller") by commanding a Master Intent that deviates from the Slave Execution. Standard behavior cloning (BC, S2S): Due to non-ideal controller, the expected output slave action ($A=X^{s}$) is poorly executed as $\dot{A}=\dot{X}^{s}$. Force Generation via Inverse Dynamics (S2M): By cloning the Master Intent, the policy learns a Virtual Equilibrium Point that penetrates constraints($A=X^{m}$) and executed as the expected out $\dot{A}={X}^{s}$ through the non-ideal controller. Force Perception via System ID (SM2M): By explicitly conditioning on the Intent-Execution Mismatch, the policy performs implicit system identification to recover the closed-loop feedback needed to adapt to dynamic uncertainties.
  • Figure 3: Comparison of asynchronous control pipelines. (a) Naïve Asynchronous Streaming, where immediate switching causes kinematic jumps. (b) Temporal Ensembling, which averages overlapping predictions to smooth transitions but introduces reaction lag by aggregating actions based on stale observations. (c) Our Latency-Adaptive Inpainting, which conditions generation on the committed buffer to enforce continuity while maintaining immediate reactivity to the latest observation.
  • Figure 4: Overview of the manipulation tasks.
  • Figure 5: Validation of Monotonic Impedance (Q1). We characterize the relationship between Intent-Execution Mismatch ($\boldsymbol{\epsilon}$) and ground-truth interaction Force across two distinct workspace configurations: (Down) Contracted Pose and (Up) Extended Pose.
  • ...and 4 more figures