Table of Contents
Fetching ...

BaTCAVe: Trustworthy Explanations for Robot Behaviors

Som Sagar, Aditya Taparia, Harsh Mankodiya, Pranav Bidare, Yifan Zhou, Ransalu Senanayake

TL;DR

The paper tackles the challenge of trustworthy explanations for robot decisions made by black-box neural networks in high-stakes settings. It introduces BaTCAVe, a post-hoc framework that grounds explanations in human-interpretable concepts via Bayesian Testing with Concept Activation Vectors, and attaches uncertainty to each explanation. Explanations are validated across diverse robotic tasks—vision-based navigation, proprioceptive control, vision-language manipulation, and autonomous driving—with results showing that uncertainty helps distinguish reliable explanations from ambiguous ones. This yields actionable insights for debugging, auditing, and improving robustness through data augmentation, fine-tuning, and domain-shift analysis, ultimately enhancing safety and trust in real-world robotic systems.

Abstract

Black box neural networks are an indispensable part of modern robots. Nevertheless, deploying such high-stakes systems in real-world scenarios poses significant challenges when the stakeholders, such as engineers and legislative bodies, lack insights into the neural networks' decision-making process. Presently, explainable AI is primarily tailored to natural language processing and computer vision, falling short in two critical aspects when applied in robots: grounding in decision-making tasks and the ability to assess trustworthiness of their explanations. In this paper, we introduce a trustworthy explainable robotics technique based on human-interpretable, high-level concepts that attribute to the decisions made by the neural network. Our proposed technique provides explanations with associated uncertainty scores for the explanation by matching neural network's activations with human-interpretable visualizations. To validate our approach, we conducted a series of experiments with various simulated and real-world robot decision-making models, demonstrating the effectiveness of the proposed approach as a post-hoc, human-friendly robot diagnostic tool.

BaTCAVe: Trustworthy Explanations for Robot Behaviors

TL;DR

The paper tackles the challenge of trustworthy explanations for robot decisions made by black-box neural networks in high-stakes settings. It introduces BaTCAVe, a post-hoc framework that grounds explanations in human-interpretable concepts via Bayesian Testing with Concept Activation Vectors, and attaches uncertainty to each explanation. Explanations are validated across diverse robotic tasks—vision-based navigation, proprioceptive control, vision-language manipulation, and autonomous driving—with results showing that uncertainty helps distinguish reliable explanations from ambiguous ones. This yields actionable insights for debugging, auditing, and improving robustness through data augmentation, fine-tuning, and domain-shift analysis, ultimately enhancing safety and trust in real-world robotic systems.

Abstract

Black box neural networks are an indispensable part of modern robots. Nevertheless, deploying such high-stakes systems in real-world scenarios poses significant challenges when the stakeholders, such as engineers and legislative bodies, lack insights into the neural networks' decision-making process. Presently, explainable AI is primarily tailored to natural language processing and computer vision, falling short in two critical aspects when applied in robots: grounding in decision-making tasks and the ability to assess trustworthiness of their explanations. In this paper, we introduce a trustworthy explainable robotics technique based on human-interpretable, high-level concepts that attribute to the decisions made by the neural network. Our proposed technique provides explanations with associated uncertainty scores for the explanation by matching neural network's activations with human-interpretable visualizations. To validate our approach, we conducted a series of experiments with various simulated and real-world robot decision-making models, demonstrating the effectiveness of the proposed approach as a post-hoc, human-friendly robot diagnostic tool.
Paper Structure (27 sections, 6 equations, 26 figures, 3 tables)

This paper contains 27 sections, 6 equations, 26 figures, 3 tables.

Figures (26)

  • Figure 1: In this pick-and-place task, users can request a post-hoc explanation for why the robot succeeded (or failed). BaTCAVe probes the policy network to obtain a ranked list of possible explanations, each with an associated likelihood score. In this example, the object's redness and cylindrical shape are likely contributors to the robot's actions. The uncertainty intervals indicate how much the user should trust each explanation. Explanations help with model debugging, auditing for regulatory compliance, and building trust.
  • Figure 2: The user specifies the robot behavior of interest for analysis using action concepts, $C_A$ (e.g., when the robot is acting near the table). The user also provides input concepts, $C_I$, (e.g., potential explanations represented as a dictionary of images or proprioception data), and test data, $\mathbf{x}$ (e.g., images or proprioception data). BaTCAVe measures the similarity of activation strengths and directions between each set of input concepts and test data for the given behavior. To do this, the input concepts are linearly separated in the activation space using a Bayesian classifier, resulting in infinite number of classification boundaries, each with a different probability.
  • Figure 3: Data with two classes (red and blue) are represented in the activation space. If the uncertainty is high (case 2 vs. 3), then we can sample many valid lines (i.e., many explanations). Though many lines can be sampled from case 1 as well, since the accuracy is low, the explanations cannot be trusted.
  • Figure 4: (a) Shows the distribution of input concepts $C_I$ selected by the participants for object avoidance task. (b) and (c) shows the effects of fine-tuning and data augmentation, respectively. The higher the score, the better the explanation is. (d) shows samples of dark and light concepts. (e) shows while common XAI methods such as GradCam selvaraju2017grad and LIME ribeiro2016should can highlight the orange box, they do not reveal what attributes (i.e., input concepts) of the box contribute to the decision of the DNN, making it harder for the engineers to improve the DNN based on the explanations. In contrast, BaTCAVe provides semantically meaningful explanations. (f) highlights the change in confidence over modifying darkness factor in input with models trained with different data augmentation (C-Modification).
  • Figure 5: A JetBot rollout. We investigate what attributes (i.e., concepts) of the obstacle influenced the decision to turn.
  • ...and 21 more figures