Table of Contents
Fetching ...

Generalizing GradCAM for Embedding Networks

Mudit Bachhawat

TL;DR

The paper tackles explainability for embedding networks that output continuous embeddings rather than discrete class scores, hindering GradCAM-style localization. It introduces EmbeddingCAM, a GradCAM-like heatmap mechanism that uses class proxies $p_c$ and a loss $\mathcal{L}_c = y \cdot p_c$ to backpropagate through embeddings. Two proxy schemes are proposed: Normalized Mean Proxy and Single Point Proxy, and EmbeddingCAM reduces to GradCAM when proxies are one-hot vectors. Evaluations on CUB-200-2011 show competitive mean heatmap ratio and weakly supervised localization accuracy without sampling, with both single-point and averaged proxies producing stable results. Overall, EmbeddingCAM enables accurate, single-image explanations for metric-learning models and broadens the applicability of heatmap-based interpretability.

Abstract

Visualizing CNN is an important part in building trust and explaining model's prediction. Methods like CAM and GradCAM have been really successful in localizing area of the image responsible for the output but are only limited to classification models. In this paper, we present a new method EmbeddingCAM, which generalizes the Grad-CAM for embedding networks. We show that for classification networks, EmbeddingCAM reduces to GradCAM. We show the effectiveness of our method on CUB-200-2011 dataset and also present quantitative and qualitative analysis on the dataset.

Generalizing GradCAM for Embedding Networks

TL;DR

The paper tackles explainability for embedding networks that output continuous embeddings rather than discrete class scores, hindering GradCAM-style localization. It introduces EmbeddingCAM, a GradCAM-like heatmap mechanism that uses class proxies and a loss to backpropagate through embeddings. Two proxy schemes are proposed: Normalized Mean Proxy and Single Point Proxy, and EmbeddingCAM reduces to GradCAM when proxies are one-hot vectors. Evaluations on CUB-200-2011 show competitive mean heatmap ratio and weakly supervised localization accuracy without sampling, with both single-point and averaged proxies producing stable results. Overall, EmbeddingCAM enables accurate, single-image explanations for metric-learning models and broadens the applicability of heatmap-based interpretability.

Abstract

Visualizing CNN is an important part in building trust and explaining model's prediction. Methods like CAM and GradCAM have been really successful in localizing area of the image responsible for the output but are only limited to classification models. In this paper, we present a new method EmbeddingCAM, which generalizes the Grad-CAM for embedding networks. We show that for classification networks, EmbeddingCAM reduces to GradCAM. We show the effectiveness of our method on CUB-200-2011 dataset and also present quantitative and qualitative analysis on the dataset.
Paper Structure (15 sections, 16 equations, 2 figures, 1 table)

This paper contains 15 sections, 16 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Sample results generated using mean proxy method
  • Figure 2: Diagram showing our method for generating heatmap from embedding models