Table of Contents
Fetching ...

Efficient and Concise Explanations for Object Detection with Gaussian-Class Activation Mapping Explainer

Quoc Khanh Nguyen, Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Van Binh Truong, Tuong Phan, Hung Cao

TL;DR

The paper addresses the need for fast and faithful explanations in object detection by introducing G-CAME, a Gaussian Class Activation Mapping Explainer. G-CAME extends CAM-based XAI to detectors by gradient-guided localization and Gaussian masking, producing concise saliency maps for a targeted object. It demonstrates superior localization fidelity and reduced noise compared with region-based methods like D-RISE, achieving around 0.5s per object and improved tiny-object bias on Faster-RCNN and YOLOX across MS-COCO 2017. The approach enables near real-time, targeted explanations with improved faithfulness and clearer visualizations for practical deployment.

Abstract

To address the challenges of providing quick and plausible explanations in Explainable AI (XAI) for object detection models, we introduce the Gaussian Class Activation Mapping Explainer (G-CAME). Our method efficiently generates concise saliency maps by utilizing activation maps from selected layers and applying a Gaussian kernel to emphasize critical image regions for the predicted object. Compared with other Region-based approaches, G-CAME significantly reduces explanation time to 0.5 seconds without compromising the quality. Our evaluation of G-CAME, using Faster-RCNN and YOLOX on the MS-COCO 2017 dataset, demonstrates its ability to offer highly plausible and faithful explanations, especially in reducing the bias on tiny object detection.

Efficient and Concise Explanations for Object Detection with Gaussian-Class Activation Mapping Explainer

TL;DR

The paper addresses the need for fast and faithful explanations in object detection by introducing G-CAME, a Gaussian Class Activation Mapping Explainer. G-CAME extends CAM-based XAI to detectors by gradient-guided localization and Gaussian masking, producing concise saliency maps for a targeted object. It demonstrates superior localization fidelity and reduced noise compared with region-based methods like D-RISE, achieving around 0.5s per object and improved tiny-object bias on Faster-RCNN and YOLOX across MS-COCO 2017. The approach enables near real-time, targeted explanations with improved faithfulness and clearer visualizations for practical deployment.

Abstract

To address the challenges of providing quick and plausible explanations in Explainable AI (XAI) for object detection models, we introduce the Gaussian Class Activation Mapping Explainer (G-CAME). Our method efficiently generates concise saliency maps by utilizing activation maps from selected layers and applying a Gaussian kernel to emphasize critical image regions for the predicted object. Compared with other Region-based approaches, G-CAME significantly reduces explanation time to 0.5 seconds without compromising the quality. Our evaluation of G-CAME, using Faster-RCNN and YOLOX on the MS-COCO 2017 dataset, demonstrates its ability to offer highly plausible and faithful explanations, especially in reducing the bias on tiny object detection.
Paper Structure (22 sections, 16 equations, 4 figures, 1 table)

This paper contains 22 sections, 16 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of G-CAME method. We use the gradient-based technique to get the target object's location and weight for each feature map. We multiply element-wise with Gaussian kernel for each weighted feature map to remove unrelated regions. After applying the Gaussian kernel, the output saliency map is created by a linear combination of all weighted feature maps.
  • Figure 2: The result of Cascading Randomization and Independent Randomization for five layers from top to bottom of the YOLOX model. Chosen layers in the head part do not include the layer in the regression branch. The result shows G-CAME is sensitive to the model's parameters.
  • Figure 3: Visualization results of GradCAM, D-RISE, and G-CAME on samples of MS-COCO 2017 dataset. G-CAME can generate the least noisy saliency maps for explaining a specific object.
  • Figure 4: The saliency map of D-RISE and G-CAME for tiny objects prediction. We evaluate them in two cases: (a) multiple tiny objects from the same class lying close together and (b) multiple tiny objects from different classes lying close together. In both cases, G-CAME can clearly identify each object in its explanations.