Table of Contents
Fetching ...

Real-Time Incremental Explanations for Object Detectors in Autonomous Driving

Santiago Calderón-Peña, Hana Chockler, David A. Kelly

TL;DR

IncX delivers real-time black-box explanations for object detectors in autonomous driving by propagating saliency maps across frames through a constrained affine transformation grounded in 3D-to-2D projection. The approach bootstraps with a traditional explainer on the first frame and then uses scaling and translation to update explanations for subsequent frames, enabling near real-time performance with minimal overhead. Theoretical foundations treat saliency maps as pmfs subject to affine transformations, and sufficient explanations are obtained via a binary-search-based procedure. Empirical results across multiple autonomous-driving datasets show IncX matches or closely approaches the quality of state-of-the-art baselines like d-rise while achieving two-order-of-magnitude speedups, validating its practicality for real-time deployment.

Abstract

Object detectors are widely used in safety-critical real-time applications such as autonomous driving. Explainability is especially important for safety-critical applications, and due to the variety of object detectors and their often proprietary nature, black-box explainability tools are needed. However, existing black-box explainability tools for AI models rely on multiple model calls, rendering them impractical for real-time use. In this paper, we introduce IncX, an algorithm and a tool for real-time black-box explainability for object detectors. The algorithm is based on linear transformations of saliency maps, producing sufficient explanations. We evaluate our implementation on four widely used video datasets of autonomous driving and demonstrate that IncX's explanations are comparable in quality to the state-of-the-art and are computed two orders of magnitude faster than the state-of-the-art, making them usable in real time.

Real-Time Incremental Explanations for Object Detectors in Autonomous Driving

TL;DR

IncX delivers real-time black-box explanations for object detectors in autonomous driving by propagating saliency maps across frames through a constrained affine transformation grounded in 3D-to-2D projection. The approach bootstraps with a traditional explainer on the first frame and then uses scaling and translation to update explanations for subsequent frames, enabling near real-time performance with minimal overhead. Theoretical foundations treat saliency maps as pmfs subject to affine transformations, and sufficient explanations are obtained via a binary-search-based procedure. Empirical results across multiple autonomous-driving datasets show IncX matches or closely approaches the quality of state-of-the-art baselines like d-rise while achieving two-order-of-magnitude speedups, validating its practicality for real-time deployment.

Abstract

Object detectors are widely used in safety-critical real-time applications such as autonomous driving. Explainability is especially important for safety-critical applications, and due to the variety of object detectors and their often proprietary nature, black-box explainability tools are needed. However, existing black-box explainability tools for AI models rely on multiple model calls, rendering them impractical for real-time use. In this paper, we introduce IncX, an algorithm and a tool for real-time black-box explainability for object detectors. The algorithm is based on linear transformations of saliency maps, producing sufficient explanations. We evaluate our implementation on four widely used video datasets of autonomous driving and demonstrate that IncX's explanations are comparable in quality to the state-of-the-art and are computed two orders of magnitude faster than the state-of-the-art, making them usable in real time.
Paper Structure (8 sections, 4 theorems, 7 equations, 3 figures, 3 tables, 1 algorithm)

This paper contains 8 sections, 4 theorems, 7 equations, 3 figures, 3 tables, 1 algorithm.

Key Result

Lemma 1

For a fixed observer, movement of an object in $3D$ space without rotation or deformation can only result in a combination of scaling and linear translation when projected on a given vertical plane.

Figures (3)

  • Figure 1: Detection of an object 'car' (Frames) at four different time stamps, with d-rise saliency landscapes and the approximate landscapes and explanations generated by IncX, showing the similarity between the fully computed d-rise saliency maps and the estimated IncX ones. $V_t$ represents the object at time $t$.
  • Figure 2: Block diagram of IncX components: First frame process (Top) and subsequent frames (Bottom)
  • Figure 3: Visualization of the Explanation procedure.

Theorems & Definitions (7)

  • Lemma 1
  • Lemma 2: Hogg2019IntroductionStatistics
  • Lemma 3
  • Definition 1: Center Function
  • Definition 2: Scaling and Translation
  • Theorem 1
  • Definition 3: CH24