BaTCAVe: Trustworthy Explanations for Robot Behaviors
Som Sagar, Aditya Taparia, Harsh Mankodiya, Pranav Bidare, Yifan Zhou, Ransalu Senanayake
TL;DR
The paper tackles the challenge of trustworthy explanations for robot decisions made by black-box neural networks in high-stakes settings. It introduces BaTCAVe, a post-hoc framework that grounds explanations in human-interpretable concepts via Bayesian Testing with Concept Activation Vectors, and attaches uncertainty to each explanation. Explanations are validated across diverse robotic tasks—vision-based navigation, proprioceptive control, vision-language manipulation, and autonomous driving—with results showing that uncertainty helps distinguish reliable explanations from ambiguous ones. This yields actionable insights for debugging, auditing, and improving robustness through data augmentation, fine-tuning, and domain-shift analysis, ultimately enhancing safety and trust in real-world robotic systems.
Abstract
Black box neural networks are an indispensable part of modern robots. Nevertheless, deploying such high-stakes systems in real-world scenarios poses significant challenges when the stakeholders, such as engineers and legislative bodies, lack insights into the neural networks' decision-making process. Presently, explainable AI is primarily tailored to natural language processing and computer vision, falling short in two critical aspects when applied in robots: grounding in decision-making tasks and the ability to assess trustworthiness of their explanations. In this paper, we introduce a trustworthy explainable robotics technique based on human-interpretable, high-level concepts that attribute to the decisions made by the neural network. Our proposed technique provides explanations with associated uncertainty scores for the explanation by matching neural network's activations with human-interpretable visualizations. To validate our approach, we conducted a series of experiments with various simulated and real-world robot decision-making models, demonstrating the effectiveness of the proposed approach as a post-hoc, human-friendly robot diagnostic tool.
