Table of Contents
Fetching ...

From Mystery to Mastery: Failure Diagnosis for Improving Manipulation Policies

Som Sagar, Jiafei Duan, Sreevishakh Vasudevan, Yifan Zhou, Heni Ben Amor, Dieter Fox, Ransalu Senanayake

TL;DR

RoboMD addresses the challenge of unknown failure modes in robotic manipulation by pairing a PPO-based deep RL search over environment variations with a vision-language embedding that generalizes to unseen conditions. It provides probabilistic failure-mode rankings (FM probabilities) and demonstrates how failures can be leveraged to fine-tune policies, improving robustness across tasks and training methods. The framework is validated through extensive simulations and real-world experiments, showing superior FM detection and generalization compared with RL and VLM baselines. Overall, RoboMD offers a systematic, scalable pathway to diagnose and mitigate failures before deployment, enhancing the reliability of manipulation policies in unstructured environments.

Abstract

Robot manipulation policies often fail for unknown reasons, posing significant challenges for real-world deployment. Researchers and engineers typically address these failures using heuristic approaches, which are not only labor-intensive and costly but also prone to overlooking critical failure modes (FMs). This paper introduces Robot Manipulation Diagnosis (RoboMD), a systematic framework designed to automatically identify FMs arising from unanticipated changes in the environment. Considering the vast space of potential FMs in a pre-trained manipulation policy, we leverage deep reinforcement learning (deep RL) to explore and uncover these FMs using a specially trained vision-language embedding that encodes a notion of failures. This approach enables users to probabilistically quantify and rank failures in previously unseen environmental conditions. Through extensive experiments across various manipulation tasks and algorithms, we demonstrate RoboMD's effectiveness in diagnosing unknown failures in unstructured environments, providing a systematic pathway to improve the robustness of manipulation policies.

From Mystery to Mastery: Failure Diagnosis for Improving Manipulation Policies

TL;DR

RoboMD addresses the challenge of unknown failure modes in robotic manipulation by pairing a PPO-based deep RL search over environment variations with a vision-language embedding that generalizes to unseen conditions. It provides probabilistic failure-mode rankings (FM probabilities) and demonstrates how failures can be leveraged to fine-tune policies, improving robustness across tasks and training methods. The framework is validated through extensive simulations and real-world experiments, showing superior FM detection and generalization compared with RL and VLM baselines. Overall, RoboMD offers a systematic, scalable pathway to diagnose and mitigate failures before deployment, enhancing the reliability of manipulation policies in unstructured environments.

Abstract

Robot manipulation policies often fail for unknown reasons, posing significant challenges for real-world deployment. Researchers and engineers typically address these failures using heuristic approaches, which are not only labor-intensive and costly but also prone to overlooking critical failure modes (FMs). This paper introduces Robot Manipulation Diagnosis (RoboMD), a systematic framework designed to automatically identify FMs arising from unanticipated changes in the environment. Considering the vast space of potential FMs in a pre-trained manipulation policy, we leverage deep reinforcement learning (deep RL) to explore and uncover these FMs using a specially trained vision-language embedding that encodes a notion of failures. This approach enables users to probabilistically quantify and rank failures in previously unseen environmental conditions. Through extensive experiments across various manipulation tasks and algorithms, we demonstrate RoboMD's effectiveness in diagnosing unknown failures in unstructured environments, providing a systematic pathway to improve the robustness of manipulation policies.

Paper Structure

This paper contains 25 sections, 6 equations, 18 figures, 6 tables, 2 algorithms.

Figures (18)

  • Figure 1: RoboMD diagnoses failure modes in pre-trained manipulation policies by interacting with the policy and its environment to quantify and rank failure probabilities across both seen and unseen environmental variations (e.g., different object types in this case). This highlights RoboMD's ability to generalize failure diagnosis beyond known environments.
  • Figure 2: RoboMD Framework: (1) A PPO-based deep RL agent identifies configurations most likely to induce failures by changing the environment and rolling out the pre-trained manipulation policy. (2) Once PPO training is complete, its output distribution, given an input image of the environment, is analyzed to derive probabilities for each failure mode (FM), quantifying the likelihood of failure. The simpler case is the discrete action space that directly quantifies failure probabilities for candidate FMs. With the continuous action space, we can quantify the likelihood for unseen environment changes. (3) FM likelihoods can be used to fine-tune the policy.
  • Figure 3: The pipeline illustrates how rollouts with disruptions (e.g., object or lighting changes) are processed to learn meaningful embeddings. Text and visual data from the rollouts are embedded using CLIP and ViT, then projected through an MLP to generate text, image to failure aligned representations.
  • Figure 4: Continuous Action Space Exploration. The diagram illustrates three types of regions in the action space: Unknown (blue), Success (green), and Failure (red). Known embeddings (stars) represent pre-computed reference points, which guide the exploration process. Orange circles depict actions taken by the RoboMD RL agent, with arrows indicating the sequence of transitions during exploration. Dashed boundaries indicate naturally formed action regions, grouping similar outcomes (e.g., all stars within an action region represent the same action, such as changing the cube color to red). The RoboMD RL agent systematically navigates the action space, transitioning across different regions and identifying failure modes. Since these traversals are always directed toward failures, the learned policy, $\pi^\text{MD}$, represents a failure distribution.
  • Figure 5: Some environment variations for both simulation and real-world evaluation.
  • ...and 13 more figures