Table of Contents
Fetching ...

FM-G-CAM: A Holistic Approach for Explainable AI in Computer Vision

Ravidu Suien Rammuni Silva, Jordan J. Bird

TL;DR

Explainable AI for computer vision often relies on single-class saliency maps, which can miss substantial parts of the model's reasoning. FM-G-CAM extends Grad-CAM by computing and fusing saliency maps for the top-$K$ predictions, using $L_2$ normalization to produce a single multiclass explanation map $S_{FM-G-CAM}$. The work provides a formal theory, an algorithm, and an open-source PyTorch library, and demonstrates quantitative gains on multiclass saliency metrics (e.g., higher $I C$ and $D C$) with applications in general image classification and chest X-ray analysis. This approach offers more faithful, holistic explanations of CNN decisions and supports safer deployment across diverse CV tasks.

Abstract

Explainability is an aspect of modern AI that is vital for impact and usability in the real world. The main objective of this paper is to emphasise the need to understand the predictions of Computer Vision models, specifically Convolutional Neural Network (CNN) based models. Existing methods of explaining CNN predictions are mostly based on Gradient-weighted Class Activation Maps (Grad-CAM) and solely focus on a single target class. We show that from the point of the target class selection, we make an assumption on the prediction process, hence neglecting a large portion of the predictor CNN model's thinking process. In this paper, we present an exhaustive methodology called Fused Multi-class Gradient-weighted Class Activation Map (FM-G-CAM) that considers multiple top predicted classes, which provides a holistic explanation of the predictor CNN's thinking rationale. We also provide a detailed and comprehensive mathematical and algorithmic description of our method. Furthermore, along with a concise comparison of existing methods, we compare FM-G-CAM with Grad-CAM, highlighting its benefits through real-world practical use cases. Finally, we present an open-source Python library with FM-G-CAM implementation to conveniently generate saliency maps for CNN-based model predictions.

FM-G-CAM: A Holistic Approach for Explainable AI in Computer Vision

TL;DR

Explainable AI for computer vision often relies on single-class saliency maps, which can miss substantial parts of the model's reasoning. FM-G-CAM extends Grad-CAM by computing and fusing saliency maps for the top- predictions, using normalization to produce a single multiclass explanation map . The work provides a formal theory, an algorithm, and an open-source PyTorch library, and demonstrates quantitative gains on multiclass saliency metrics (e.g., higher and ) with applications in general image classification and chest X-ray analysis. This approach offers more faithful, holistic explanations of CNN decisions and supports safer deployment across diverse CV tasks.

Abstract

Explainability is an aspect of modern AI that is vital for impact and usability in the real world. The main objective of this paper is to emphasise the need to understand the predictions of Computer Vision models, specifically Convolutional Neural Network (CNN) based models. Existing methods of explaining CNN predictions are mostly based on Gradient-weighted Class Activation Maps (Grad-CAM) and solely focus on a single target class. We show that from the point of the target class selection, we make an assumption on the prediction process, hence neglecting a large portion of the predictor CNN model's thinking process. In this paper, we present an exhaustive methodology called Fused Multi-class Gradient-weighted Class Activation Map (FM-G-CAM) that considers multiple top predicted classes, which provides a holistic explanation of the predictor CNN's thinking rationale. We also provide a detailed and comprehensive mathematical and algorithmic description of our method. Furthermore, along with a concise comparison of existing methods, we compare FM-G-CAM with Grad-CAM, highlighting its benefits through real-world practical use cases. Finally, we present an open-source Python library with FM-G-CAM implementation to conveniently generate saliency maps for CNN-based model predictions.
Paper Structure (21 sections, 6 equations, 8 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 6 equations, 8 figures, 3 tables, 1 algorithm.

Figures (8)

  • Figure 1: FM-G-CAM for general image classification tasks against Grad-CAM
  • Figure 2: The process of generating FM-G-CAM.
  • Figure 3: Effect of L2 Norm for saliency map generation in FM-G-CAM.
  • Figure 4: Overview of the XAI Inference Engine.
  • Figure 5: Overview of the XAI Inference Engine. Column 2 shows the output for FM-G-CAM, while columns 3 to 6 show the corresponding saliency map outputs from Grad-CAM.
  • ...and 3 more figures