AUTHENTICATION: Identifying Rare Failure Modes in Autonomous Vehicle Perception Systems using Adversarially Guided Diffusion Models

Mohammad Zarei; Melanie A Jutras; Eliana Evans; Mike Tan; Omid Aaramoon

AUTHENTICATION: Identifying Rare Failure Modes in Autonomous Vehicle Perception Systems using Adversarially Guided Diffusion Models

Mohammad Zarei, Melanie A Jutras, Eliana Evans, Mike Tan, Omid Aaramoon

TL;DR

This paper tackles the problem of identifying rare failure modes (RFMs) in autonomous vehicle perception under long-tail conditions by introducing an Authentication framework that combines adversarially guided diffusion-based inpainting with explainability. It generates realistic RFMs around fixed objects, guided by object-detection losses, and extends the approach to multi-modal outputs while enforcing realism through consistency verification. A natural-language explainability layer using Grad-CAM overlays and GPT-4o captions translates the failures into actionable root-cause descriptions for developers and policymakers. The results demonstrate that RFMs can be produced with high realism and that the accompanying captions provide meaningful insights into attention disruptions and environmental factors, potentially improving robustness and safety in AV systems.

Abstract

Autonomous Vehicles (AVs) rely on artificial intelligence (AI) to accurately detect objects and interpret their surroundings. However, even when trained using millions of miles of real-world data, AVs are often unable to detect rare failure modes (RFMs). The problem of RFMs is commonly referred to as the "long-tail challenge", due to the distribution of data including many instances that are very rarely seen. In this paper, we present a novel approach that utilizes advanced generative and explainable AI techniques to aid in understanding RFMs. Our methods can be used to enhance the robustness and reliability of AVs when combined with both downstream model training and testing. We extract segmentation masks for objects of interest (e.g., cars) and invert them to create environmental masks. These masks, combined with carefully crafted text prompts, are fed into a custom diffusion model. We leverage the Stable Diffusion inpainting model guided by adversarial noise optimization to generate images containing diverse environments designed to evade object detection models and expose vulnerabilities in AI systems. Finally, we produce natural language descriptions of the generated RFMs that can guide developers and policymakers to improve the safety and reliability of AV systems.

AUTHENTICATION: Identifying Rare Failure Modes in Autonomous Vehicle Perception Systems using Adversarially Guided Diffusion Models

TL;DR

Abstract

AUTHENTICATION: Identifying Rare Failure Modes in Autonomous Vehicle Perception Systems using Adversarially Guided Diffusion Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)