Table of Contents
Fetching ...

LaFAM: Unsupervised Feature Attribution with Label-free Activation Maps

Aray Karjauv, Sahin Albayrak

TL;DR

The paper addresses explainability for CNNs in the absence of labels, particularly within self-supervised learning (SSL). It introduces Label-free Activation Map (LaFAM), which computes a label-free saliency map by averaging activations across channels at a chosen convolutional layer and normalizing/up-sampling the result to the input size, yielding $M_ ext{LaFAM} = \text{Up}\left( N\left(\bar{A}^l\right)\right)$ with $\bar{A}^l_{i,j} = \frac{1}{K}\sum_{k=1}^{K} A_{i,j}^{l,k}$; this avoids labels and enables efficient post-hoc explanations from a single forward pass. Empirically, LaFAM outperforms RELAX across SSL benchmarks on ImageNet-S and PASCAL VOC 2012, while remaining competitive with Grad-CAM in supervised settings and capable of highlighting multiple concepts in complex scenes. The approach expands the XAI toolbox with a robust, label-free, and computationally efficient saliency method suitable for both SSL and supervised tasks, facilitating better inspection of model behavior without relying on class predictions. The work also provides code and a live demo to support reproducibility and broader adoption.

Abstract

Convolutional Neural Networks (CNNs) are known for their ability to learn hierarchical structures, naturally developing detectors for objects, and semantic concepts within their deeper layers. Activation maps (AMs) reveal these saliency regions, which are crucial for many Explainable AI (XAI) methods. However, the direct exploitation of raw AMs in CNNs for feature attribution remains underexplored in literature. This work revises Class Activation Map (CAM) methods by introducing the Label-free Activation Map (LaFAM), a streamlined approach utilizing raw AMs for feature attribution without reliance on labels. LaFAM presents an efficient alternative to conventional CAM methods, demonstrating particular effectiveness in saliency map generation for self-supervised learning while maintaining applicability in supervised learning scenarios.

LaFAM: Unsupervised Feature Attribution with Label-free Activation Maps

TL;DR

The paper addresses explainability for CNNs in the absence of labels, particularly within self-supervised learning (SSL). It introduces Label-free Activation Map (LaFAM), which computes a label-free saliency map by averaging activations across channels at a chosen convolutional layer and normalizing/up-sampling the result to the input size, yielding with ; this avoids labels and enables efficient post-hoc explanations from a single forward pass. Empirically, LaFAM outperforms RELAX across SSL benchmarks on ImageNet-S and PASCAL VOC 2012, while remaining competitive with Grad-CAM in supervised settings and capable of highlighting multiple concepts in complex scenes. The approach expands the XAI toolbox with a robust, label-free, and computationally efficient saliency method suitable for both SSL and supervised tasks, facilitating better inspection of model behavior without relying on class predictions. The work also provides code and a live demo to support reproducibility and broader adoption.

Abstract

Convolutional Neural Networks (CNNs) are known for their ability to learn hierarchical structures, naturally developing detectors for objects, and semantic concepts within their deeper layers. Activation maps (AMs) reveal these saliency regions, which are crucial for many Explainable AI (XAI) methods. However, the direct exploitation of raw AMs in CNNs for feature attribution remains underexplored in literature. This work revises Class Activation Map (CAM) methods by introducing the Label-free Activation Map (LaFAM), a streamlined approach utilizing raw AMs for feature attribution without reliance on labels. LaFAM presents an efficient alternative to conventional CAM methods, demonstrating particular effectiveness in saliency map generation for self-supervised learning while maintaining applicability in supervised learning scenarios.
Paper Structure (8 sections, 2 equations, 4 figures, 2 tables)

This paper contains 8 sections, 2 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Saliency maps comparison. The contours are overlaid to provide a reference for the ground truth. The title on the left displays the true labels from PASCAL VOC 2012 alongside the ImageNet labels predicted by ResNet50. Grad-CAM utilizes the predicted labels to generate saliency maps. Noticeably, the LaFAM saliency maps are very similar to those produced by Grad-CAM, while RELAX produces noisy saliency maps. The first row demonstrates a misclassification example, showcasing a situation where Grad-CAM fails to highlight the correct region. A detailed comparison is presented in Section \ref{['sec:experiment']}.
  • Figure 2: Saliency maps comparison for scenes with two distinct objects. Left-hand labels indicate ImageNet labels predicted by ResNet50 classifier.
  • Figure 3: Examples of misclassifications on ImageNet-1k. The left-hand title indicates the method used to generate the saliency maps, while the top title indicates ImageNet-1k ground truth and labels predicted by ResNet50.
  • Figure 4: Additional results for PASCAL VOC 2012. The title on the left displays the true labels from PASCAL VOC 2012, alongside ImageNet labels predicted by ResNet50. Grad-CAM utilizes the predicted labels to generate saliency maps.