Table of Contents
Fetching ...

Benchmarking Class Activation Map Methods for Explainable Brain Hemorrhage Classification on Hemorica Dataset

Z. Rafati, M. Hoseyni, J. Khoramdel, A. Nikoofard

TL;DR

This paper tackles the challenge of explainability in brain hemorrhage classification by systematically benchmarking Class Activation Mapping (CAM) methods on the Hemorica dataset, which provides pixel-level ground truth. Using EfficientNetV2-S as the backbone, it analyzes ten CAM variants across the last three convolutional layers to quantify localization quality with pixel-wise and bounding-box metrics. The study demonstrates that higher input resolution and augmentation improve classification performance, with layer [-3] yielding the strongest localization; HiResCAM and AblationCAM emerge as top explainability methods, achieving Dice ~0.57 and IoU ~0.40. By delivering a reproducible CAM benchmark on a clinically relevant task, it highlights the potential of XAI-driven workflows for AI-assisted brain hemorrhage diagnosis and points to future refinements in post-processing to further improve localization accuracy.

Abstract

Explainable Artificial Intelligence (XAI) has become an essential component of medical imaging research, aiming to increase transparency and clinical trust in deep learning models. This study investigates brain hemorrhage diagnosis with a focus on explainability through Class Activation Mapping (CAM) techniques. A pipeline was developed to extract pixellevel segmentation and detection annotations from classification models using nine state-of-the-art CAM algorithms, applied across multiple network stages, and quantitatively evaluated on the Hemorica dataset, which uniquely provides both slice-level labels and high-quality segmentation masks. Metrics including Dice, IoU, and pixel-wise overlap were employed to benchmark CAM variants. Results show that the strongest localization performance occurred at stage 5 of EfficientNetV2S, with HiResCAM yielding the highest bounding-box alignment and AblationCAM achieving the best pixel-level Dice (0.57) and IoU (0.40), representing strong accuracy given that models were trained solely for classification without segmentation supervision. To the best of current knowledge, this is among the f irst works to quantitatively compare CAM methods for brain hemorrhage detection, establishing a reproducible benchmark and underscoring the potential of XAI-driven pipelines for clinically meaningful AI-assisted diagnosis.

Benchmarking Class Activation Map Methods for Explainable Brain Hemorrhage Classification on Hemorica Dataset

TL;DR

This paper tackles the challenge of explainability in brain hemorrhage classification by systematically benchmarking Class Activation Mapping (CAM) methods on the Hemorica dataset, which provides pixel-level ground truth. Using EfficientNetV2-S as the backbone, it analyzes ten CAM variants across the last three convolutional layers to quantify localization quality with pixel-wise and bounding-box metrics. The study demonstrates that higher input resolution and augmentation improve classification performance, with layer [-3] yielding the strongest localization; HiResCAM and AblationCAM emerge as top explainability methods, achieving Dice ~0.57 and IoU ~0.40. By delivering a reproducible CAM benchmark on a clinically relevant task, it highlights the potential of XAI-driven workflows for AI-assisted brain hemorrhage diagnosis and points to future refinements in post-processing to further improve localization accuracy.

Abstract

Explainable Artificial Intelligence (XAI) has become an essential component of medical imaging research, aiming to increase transparency and clinical trust in deep learning models. This study investigates brain hemorrhage diagnosis with a focus on explainability through Class Activation Mapping (CAM) techniques. A pipeline was developed to extract pixellevel segmentation and detection annotations from classification models using nine state-of-the-art CAM algorithms, applied across multiple network stages, and quantitatively evaluated on the Hemorica dataset, which uniquely provides both slice-level labels and high-quality segmentation masks. Metrics including Dice, IoU, and pixel-wise overlap were employed to benchmark CAM variants. Results show that the strongest localization performance occurred at stage 5 of EfficientNetV2S, with HiResCAM yielding the highest bounding-box alignment and AblationCAM achieving the best pixel-level Dice (0.57) and IoU (0.40), representing strong accuracy given that models were trained solely for classification without segmentation supervision. To the best of current knowledge, this is among the f irst works to quantitatively compare CAM methods for brain hemorrhage detection, establishing a reproducible benchmark and underscoring the potential of XAI-driven pipelines for clinically meaningful AI-assisted diagnosis.

Paper Structure

This paper contains 20 sections, 2 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: EfficientNetV2-S architecture used in this study. The diagram shows input and output feature map dimensions at each stage. $N$ indicates the number of repetitions of the given block type. The last three convolutional layers ([$-1$], [$-2$], and [$-3$]) correspond to the layers used for CAM-based analysis.
  • Figure 2: Model comparison (mean $\pm$ std of top-5 F1 Score). This bar chart compares the performance of different EfficientNetV2 variants under the baseline configuration (input resolution 224$\times$224, no augmentation, and BCE positive weight of 1), where the only factor changed is the model architecture. Among all evaluated backbones, EfficientNetV2-S achieves the highest F1 score, indicating superior classification performance for this task.
  • Figure 3: Precision-Recall curves of EfficientNetV2 variants at the epoch corresponding to the best F1 score. Each curve shows the trade-off between precision and recall as the classification threshold is varied from 0.3 (rightmost point) to 0.7 (leftmost point). The results demonstrate that all models are relatively insensitive to small changes in threshold, maintaining high precision across a wide recall range. Among the compared architectures, EfficientNetV2-S (512$\times$512) (pink curve) achieves the highest area under the precision-recall curve (AUC), confirming its superiority as the final chosen model.
  • Figure 4: Example of Class Activation Mapping (CAM) visualizations for a hemorrhage-positive CT slice using HiResCAM across three different network depths. (a) Original CT slice; (b, c) overlay and binary mask from layer $[-3]$; (d, e) overlay and mask from layer $[-2]$; (f, g) overlay and mask from layer $[-1]$. In overlays, the red regions correspond to CAM-predicted hemorrhage, the green regions represent the ground-truth segmentation mask, and the yellow regions denote the intersection between prediction and ground truth. This progression illustrates how CAM maps evolve across successive convolutional layers, with earlier layers often capturing broader regions and deeper layers providing more focused localization.