Table of Contents
Fetching ...

Force-Aware Residual DAgger via Trajectory Editing for Precision Insertion with Impedance Control

Yiou Huang, Ma Ning, Weichu Zhao, Zinuo Liu, Jun Sun, Qiufeng Wang, Yaran Chen

TL;DR

A scalable and force-aware human-in-the-loop imitation learning framework that mitigates covariate shift by learning residual policies through optimization-based trajectory editing, and introduces a force-aware failure anticipation mechanism that triggers human intervention only when discrepancies arise between predicted and measured end-effector forces.

Abstract

Imitation learning (IL) has shown strong potential for contact-rich precision insertion tasks. However, its practical deployment is often hindered by covariate shift and the need for continuous expert monitoring to recover from failures during execution. In this paper, we propose Trajectory Editing Residual Dataset Aggregation (TER-DAgger), a scalable and force-aware human-in-the-loop imitation learning framework that mitigates covariate shift by learning residual policies through optimization-based trajectory editing. This approach smoothly fuses policy rollouts with human corrective trajectories, providing consistent and stable supervision. Second, we introduce a force-aware failure anticipation mechanism that triggers human intervention only when discrepancies arise between predicted and measured end-effector forces, significantly reducing the requirement for continuous expert monitoring. Third, all learned policies are executed within a Cartesian impedance control framework, ensuring compliant and safe behavior during contact-rich interactions. Extensive experiments in both simulation and real-world precision insertion tasks show that TER-DAgger improves the average success rate by over 37\% compared to behavior cloning, human-guided correction, retraining, and fine-tuning baselines, demonstrating its effectiveness in mitigating covariate shift and enabling scalable deployment in contact-rich manipulation.

Force-Aware Residual DAgger via Trajectory Editing for Precision Insertion with Impedance Control

TL;DR

A scalable and force-aware human-in-the-loop imitation learning framework that mitigates covariate shift by learning residual policies through optimization-based trajectory editing, and introduces a force-aware failure anticipation mechanism that triggers human intervention only when discrepancies arise between predicted and measured end-effector forces.

Abstract

Imitation learning (IL) has shown strong potential for contact-rich precision insertion tasks. However, its practical deployment is often hindered by covariate shift and the need for continuous expert monitoring to recover from failures during execution. In this paper, we propose Trajectory Editing Residual Dataset Aggregation (TER-DAgger), a scalable and force-aware human-in-the-loop imitation learning framework that mitigates covariate shift by learning residual policies through optimization-based trajectory editing. This approach smoothly fuses policy rollouts with human corrective trajectories, providing consistent and stable supervision. Second, we introduce a force-aware failure anticipation mechanism that triggers human intervention only when discrepancies arise between predicted and measured end-effector forces, significantly reducing the requirement for continuous expert monitoring. Third, all learned policies are executed within a Cartesian impedance control framework, ensuring compliant and safe behavior during contact-rich interactions. Extensive experiments in both simulation and real-world precision insertion tasks show that TER-DAgger improves the average success rate by over 37\% compared to behavior cloning, human-guided correction, retraining, and fine-tuning baselines, demonstrating its effectiveness in mitigating covariate shift and enabling scalable deployment in contact-rich manipulation.
Paper Structure (35 sections, 31 equations, 3 figures, 6 tables)

This paper contains 35 sections, 31 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: (Left) TER-DAgger pipeline. The robot first executes the task using the base policy. When the error detector identifies a failure, execution is paused and a human provides a corrective insertion demonstration. To generate residual training data, we locate the nearest point on the base-policy trajectory to the start of the human demonstration as the editing endpoint. Together with its preceding $N-1$ points, this forms an editing segment, which is optimized to produce a smooth transition toward the human-corrected trajectory. Based on the optimized trajectory, we construct training data for the residual policy. (Right) Framework overview. The base policy (1 Hz) predicts future end-effector poses and forces from image observations, the current end-effector pose and force. The error detector (50 Hz) compares predicted and measured end-effector forces to detect failures. The residual policy (50 Hz) takes image observations, the current end-effector pose, and the next action predicted by the base policy as inputs, and predicts pose corrections. The corrected actions are executed via a Cartesian impedance controller (1 kHz).
  • Figure 2: Simulation scene setup and insertion task process.
  • Figure 3: Real scene setup and insertion task process.