RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning
Yinpei Dai, Jayjun Lee, Nima Fazeli, Joyce Chai
TL;DR
RACER introduces a scalable data augmentation pipeline that enriches expert demonstrations with recoverable failure trajectories and rich language annotations, paired with a vision-language supervisor and a language-conditioned visuomotor policy. The framework enables online failure analysis and corrective guidance, improving robustness across long-horizon, dynamic-goal, and unseen tasks, with strong sim-to-real transfer demonstrated on RLBench and real Panda experiments. Key contributions include automatic rich language-annotated failure recovery data, a VLM-guided supervisory signal, and empirical evidence that rich language guidance and recovery data outperform state-of-the-art baselines. The work advances practical robotic manipulation by reducing online human intervention and enabling more reliable, adaptable control in both simulated and real-world settings.
Abstract
Developing robust and correctable visuomotor policies for robotic manipulation is challenging due to the lack of self-recovery mechanisms from failures and the limitations of simple language instructions in guiding robot actions. To address these issues, we propose a scalable data generation pipeline that automatically augments expert demonstrations with failure recovery trajectories and fine-grained language annotations for training. We then introduce Rich languAge-guided failure reCovERy (RACER), a supervisor-actor framework, which combines failure recovery data with rich language descriptions to enhance robot control. RACER features a vision-language model (VLM) that acts as an online supervisor, providing detailed language guidance for error correction and task execution, and a language-conditioned visuomotor policy as an actor to predict the next actions. Our experimental results show that RACER outperforms the state-of-the-art Robotic View Transformer (RVT) on RLbench across various evaluation settings, including standard long-horizon tasks, dynamic goal-change tasks and zero-shot unseen tasks, achieving superior performance in both simulated and real world environments. Videos and code are available at: https://rich-language-failure-recovery.github.io.
