Learning Actionable Manipulation Recovery via Counterfactual Failure Synthesis

Dayou Li; Jiuzhou Lei; Hao Wang; Lulin Liu; Yunhao Yang; Zihan Wang; Bangya Liu; Minghui Zheng; Zhiwen Fan

Learning Actionable Manipulation Recovery via Counterfactual Failure Synthesis

Dayou Li, Jiuzhou Lei, Hao Wang, Lulin Liu, Yunhao Yang, Zihan Wang, Bangya Liu, Minghui Zheng, Zhiwen Fan

Abstract

While recent foundation models have significantly advanced robotic manipulation, these systems still struggle to autonomously recover from execution errors. Current failure-learning paradigms rely on either costly and unsafe real-world data collection or simulator-based perturbations, which introduce a severe sim-to-real gap. Furthermore, existing visual analyzers predominantly output coarse, binary diagnoses rather than the executable, trajectory-level corrections required for actual recovery. To bridge the gap between failure diagnosis and actionable recovery, we introduce Dream2Fix, a framework that synthesizes photorealistic, counterfactual failure rollouts directly from successful real-world demonstrations. By perturbing actions within a generative world model, Dream2Fix creates paired failure-correction data without relying on simulators. To ensure the generated data is physically viable for robot learning, we implement a structured verification mechanism that strictly filters rollouts for task validity, visual coherence, and kinematic safety. This engine produces a high-fidelity dataset of over 120k paired samples. Using this dataset, we fine-tune a vision-language model to jointly predict failure types and precise recovery trajectories, mapping visual anomalies directly to corrective actions. Extensive real-world robotic experiments show our approach achieves state-of-the-art correction accuracy, improving from 19.7% to 81.3% over prior baselines, and successfully enables zero-shot closed-loop failure recovery in physical deployments.

Learning Actionable Manipulation Recovery via Counterfactual Failure Synthesis

Abstract

Paper Structure (20 sections, 11 equations, 4 figures, 4 tables)

This paper contains 20 sections, 11 equations, 4 figures, 4 tables.

Introduction
Related Works
Failure Detection in Robotic Manipulation
Data Generation in Robotics
Robotic Vision-Language Models
Method
Problem Statement
Photorealistic Failure Generation
Failure Type Definition and Rollout Collection
Structured Failure Validation
Fix Suggestion Generation
Learn from Failure
VLM Finetuning
Closed-Loop Recovery for VLA
Experiment
...and 5 more sections

Figures (4)

Figure 1: Dream2Fix is a data generation pipeline that synthesizes large-scale, photorealistic failure rollouts with paired corrections from successful demonstrations and curates them with physical and visual verification.
Figure 2: Overview of Dream2Fix pipeline. Dream2Fix generates diverse failure cases from successful demonstrations via keyframe-level action perturbations, then validates and curates them with physical and visual verifiers. The verified rollouts are auto-labeled into a structured schema to instruction-tune a VLM that predicts actionable corrections for real-world recovery.
Figure 3: Real-world experimental workspace. A Franka Research 3 arm with a Franka Hand gripper, and a fixed-view Intel RealSense D435i RGB-D camera are equipped for real-world evaluation.
Figure 4: Real world robot execution. For each task, we show the initial failed execution (left) and the corrected execution (right). Across diverse tasks, the correction adjusts the action to recover from the failure under the same real-robot setup.

Learning Actionable Manipulation Recovery via Counterfactual Failure Synthesis

Abstract

Learning Actionable Manipulation Recovery via Counterfactual Failure Synthesis

Authors

Abstract

Table of Contents

Figures (4)