Mind the Gap: Learning Implicit Impedance in Visuomotor Policies via Intent-Execution Mismatch
Cuijie Xu, Shurui Zheng, Zihao Su, Yuanfan Xu, Tinghao Yi, Xudong Zhang, Jian Wang, Yu Wang, Jinchen Yu
TL;DR
This paper tackles the challenge of achieving force-aware manipulation with sensorless, low-cost hardware in teleoperation. By reframing learning from Execution Cloning to Intent Cloning and introducing Dual-State Conditioning based on the Intent-Execution Mismatch, the authors enable implicit impedance and force perception without explicit sensors. They further address inference latency with Latency-Adaptive Inpainting, ensuring continuous, stable control under varying delays. Empirical results across six tasks demonstrate that the proposed approach outperforms traditional execution-cloning, enabling robust contact-rich manipulation and dynamic tracking on hardware with minimal sensing. Collectively, the work advances practical, low-cost teleoperation by integrating impedance-like behavior directly into learned visuomotor policies.
Abstract
Teleoperation inherently relies on the human operator acting as a closed-loop controller to actively compensate for hardware imperfections, including latency, mechanical friction, and lack of explicit force feedback. Standard Behavior Cloning (BC), by mimicking the robot's executed trajectory, fundamentally ignores this compensatory mechanism. In this work, we propose a Dual-State Conditioning framework that shifts the learning objective to "Intent Cloning" (master command). We posit that the Intent-Execution Mismatch, the discrepancy between master command and slave response, is not noise, but a critical signal that physically encodes implicit interaction forces and algorithmically reveals the operator's strategy for overcoming system dynamics. By predicting the master intent, our policy learns to generate a "virtual equilibrium point", effectively realizing implicit impedance control. Furthermore, by explicitly conditioning on the history of this mismatch, the model performs implicit system identification, perceiving tracking errors as external forces to close the control loop. To bridge the temporal gap caused by inference latency, we further formulate the policy as a trajectory inpainter to ensure continuous control. We validate our approach on a sensorless, low-cost bi-manual setup. Empirical results across tasks requiring contact-rich manipulation and dynamic tracking reveal a decisive gap: while standard execution-cloning fails due to the inability to overcome contact stiffness and tracking lag, our mismatch-aware approach achieves robust success. This presents a minimalist behavior cloning framework for low-cost hardware, enabling force perception and dynamic compensation without relying on explicit force sensing. Videos are available on the \href{https://xucj98.github.io/mind-the-gap-page/}{project page}.
