Benchmarking Class Activation Map Methods for Explainable Brain Hemorrhage Classification on Hemorica Dataset
Z. Rafati, M. Hoseyni, J. Khoramdel, A. Nikoofard
TL;DR
This paper tackles the challenge of explainability in brain hemorrhage classification by systematically benchmarking Class Activation Mapping (CAM) methods on the Hemorica dataset, which provides pixel-level ground truth. Using EfficientNetV2-S as the backbone, it analyzes ten CAM variants across the last three convolutional layers to quantify localization quality with pixel-wise and bounding-box metrics. The study demonstrates that higher input resolution and augmentation improve classification performance, with layer [-3] yielding the strongest localization; HiResCAM and AblationCAM emerge as top explainability methods, achieving Dice ~0.57 and IoU ~0.40. By delivering a reproducible CAM benchmark on a clinically relevant task, it highlights the potential of XAI-driven workflows for AI-assisted brain hemorrhage diagnosis and points to future refinements in post-processing to further improve localization accuracy.
Abstract
Explainable Artificial Intelligence (XAI) has become an essential component of medical imaging research, aiming to increase transparency and clinical trust in deep learning models. This study investigates brain hemorrhage diagnosis with a focus on explainability through Class Activation Mapping (CAM) techniques. A pipeline was developed to extract pixellevel segmentation and detection annotations from classification models using nine state-of-the-art CAM algorithms, applied across multiple network stages, and quantitatively evaluated on the Hemorica dataset, which uniquely provides both slice-level labels and high-quality segmentation masks. Metrics including Dice, IoU, and pixel-wise overlap were employed to benchmark CAM variants. Results show that the strongest localization performance occurred at stage 5 of EfficientNetV2S, with HiResCAM yielding the highest bounding-box alignment and AblationCAM achieving the best pixel-level Dice (0.57) and IoU (0.40), representing strong accuracy given that models were trained solely for classification without segmentation supervision. To the best of current knowledge, this is among the f irst works to quantitatively compare CAM methods for brain hemorrhage detection, establishing a reproducible benchmark and underscoring the potential of XAI-driven pipelines for clinically meaningful AI-assisted diagnosis.
