Generalizing GradCAM for Embedding Networks

Mudit Bachhawat

Generalizing GradCAM for Embedding Networks

Mudit Bachhawat

TL;DR

The paper tackles explainability for embedding networks that output continuous embeddings rather than discrete class scores, hindering GradCAM-style localization. It introduces EmbeddingCAM, a GradCAM-like heatmap mechanism that uses class proxies $p_c$ and a loss $\mathcal{L}_c = y \cdot p_c$ to backpropagate through embeddings. Two proxy schemes are proposed: Normalized Mean Proxy and Single Point Proxy, and EmbeddingCAM reduces to GradCAM when proxies are one-hot vectors. Evaluations on CUB-200-2011 show competitive mean heatmap ratio and weakly supervised localization accuracy without sampling, with both single-point and averaged proxies producing stable results. Overall, EmbeddingCAM enables accurate, single-image explanations for metric-learning models and broadens the applicability of heatmap-based interpretability.

Abstract

Visualizing CNN is an important part in building trust and explaining model's prediction. Methods like CAM and GradCAM have been really successful in localizing area of the image responsible for the output but are only limited to classification models. In this paper, we present a new method EmbeddingCAM, which generalizes the Grad-CAM for embedding networks. We show that for classification networks, EmbeddingCAM reduces to GradCAM. We show the effectiveness of our method on CUB-200-2011 dataset and also present quantitative and qualitative analysis on the dataset.

Generalizing GradCAM for Embedding Networks

TL;DR

and a loss

to backpropagate through embeddings. Two proxy schemes are proposed: Normalized Mean Proxy and Single Point Proxy, and EmbeddingCAM reduces to GradCAM when proxies are one-hot vectors. Evaluations on CUB-200-2011 show competitive mean heatmap ratio and weakly supervised localization accuracy without sampling, with both single-point and averaged proxies producing stable results. Overall, EmbeddingCAM enables accurate, single-image explanations for metric-learning models and broadens the applicability of heatmap-based interpretability.

Abstract

Paper Structure (15 sections, 16 equations, 2 figures, 1 table)

This paper contains 15 sections, 16 equations, 2 figures, 1 table.

Introduction
Related Works
Method
Preliminary
Visual Metric Learning
GradCAM
Generalizing GradCAM for Metric Learning
Calculating Proxies
Intuition
Reduction for Simple Networks
Comparison with other methods
Experiments
Dataset: CUB200-2000
Results
Conclusion

Figures (2)

Figure 1: Sample results generated using mean proxy method
Figure 2: Diagram showing our method for generating heatmap from embedding models

Generalizing GradCAM for Embedding Networks

TL;DR

Abstract

Generalizing GradCAM for Embedding Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (2)