Leveraging Activations for Superpixel Explanations
Ahcène Boubekki, Samuel G. Fadel, Sebastian Mair
TL;DR
This work introduces Neuro-Activated Superpixels (NAS), an unsupervised method that derives semantically meaningful image segmentations by clustering multi-depth feature activations from a trained classifier, without any fine-tuning. NAS yields regions aligned with the model's internal semantics, enabling a semi-supervised evaluation of saliency methods and improving the interpretability of saliency maps through NAS-based superpixelification. The authors show NAS captures class-relevant structures (e.g., bird parts) and enhances weakly supervised object localization across datasets and architectures, while also revealing inconsistencies in the AUC-LeRF metric used to assess saliency methods. Overall, NAS provides a robust, activation-driven segmentation tool that supports better explanation and evaluation of vision models, with practical impact on XAI workflows and WSOL performance.
Abstract
Saliency methods have become standard in the explanation toolkit of deep neural networks. Recent developments specific to image classifiers have investigated region-based explanations with either new methods or by adapting well-established ones using ad-hoc superpixel algorithms. In this paper, we aim to avoid relying on these segmenters by extracting a segmentation from the activations of a deep neural network image classifier without fine-tuning the network. Our so-called Neuro-Activated Superpixels (NAS) can isolate the regions of interest in the input relevant to the model's prediction, which boosts high-threshold weakly supervised object localization performance. This property enables the semi-supervised semantic evaluation of saliency methods. The aggregation of NAS with existing saliency methods eases their interpretation and reveals the inconsistencies of the widely used area under the relevance curve metric.
