Table of Contents
Fetching ...

Visual Explanation via Similar Feature Activation for Metric Learning

Yi Liao, Ugochukwu Ejike Akpudo, Jue Zhang, Yongsheng Gao, Jun Zhou, Wenyi Zeng, Weichuan Zhang

TL;DR

This work introduces SFAM, a visual explanation method tailored for metric-learning CNNs that lack a traditional FC classifier. SFAM uses a channel-wise Contribution Importance Score (CIS) computed from pairwise embeddings to form an explanation map by linearly combining per-channel importance with the final-layer feature maps, compatible with both Euclidean distance and cosine similarity. The method yields improved localization and interpretability, demonstrated through qualitative and quantitative experiments on CUB200 for few-shot image classification and image retrieval, outperforming existing explanation techniques. The approach is architecture- and metric-agnostic, and relies on well-trained weights to produce faithful explanations, highlighting the practical value for trust and guidance in metric-learning applications.

Abstract

Visual explanation maps enhance the trustworthiness of decisions made by deep learning models and offer valuable guidance for developing new algorithms in image recognition tasks. Class activation maps (CAM) and their variants (e.g., Grad-CAM and Relevance-CAM) have been extensively employed to explore the interpretability of softmax-based convolutional neural networks, which require a fully connected layer as the classifier for decision-making. However, these methods cannot be directly applied to metric learning models, as such models lack a fully connected layer functioning as a classifier. To address this limitation, we propose a novel visual explanation method termed Similar Feature Activation Map (SFAM). This method introduces the channel-wise contribution importance score (CIS) to measure feature importance, derived from the similarity measurement between two image embeddings. The explanation map is constructed by linearly combining the proposed importance weights with the feature map from a CNN model. Quantitative and qualitative experiments show that SFAM provides highly promising interpretable visual explanations for CNN models using Euclidean distance or cosine similarity as the similarity metric.

Visual Explanation via Similar Feature Activation for Metric Learning

TL;DR

This work introduces SFAM, a visual explanation method tailored for metric-learning CNNs that lack a traditional FC classifier. SFAM uses a channel-wise Contribution Importance Score (CIS) computed from pairwise embeddings to form an explanation map by linearly combining per-channel importance with the final-layer feature maps, compatible with both Euclidean distance and cosine similarity. The method yields improved localization and interpretability, demonstrated through qualitative and quantitative experiments on CUB200 for few-shot image classification and image retrieval, outperforming existing explanation techniques. The approach is architecture- and metric-agnostic, and relies on well-trained weights to produce faithful explanations, highlighting the practical value for trust and guidance in metric-learning applications.

Abstract

Visual explanation maps enhance the trustworthiness of decisions made by deep learning models and offer valuable guidance for developing new algorithms in image recognition tasks. Class activation maps (CAM) and their variants (e.g., Grad-CAM and Relevance-CAM) have been extensively employed to explore the interpretability of softmax-based convolutional neural networks, which require a fully connected layer as the classifier for decision-making. However, these methods cannot be directly applied to metric learning models, as such models lack a fully connected layer functioning as a classifier. To address this limitation, we propose a novel visual explanation method termed Similar Feature Activation Map (SFAM). This method introduces the channel-wise contribution importance score (CIS) to measure feature importance, derived from the similarity measurement between two image embeddings. The explanation map is constructed by linearly combining the proposed importance weights with the feature map from a CNN model. Quantitative and qualitative experiments show that SFAM provides highly promising interpretable visual explanations for CNN models using Euclidean distance or cosine similarity as the similarity metric.

Paper Structure

This paper contains 14 sections, 7 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: (A). CAM-based methods(e.g.,CAM CAM, Grad-CAM GradCAM, and Relevance-CAM RelevanceCAM) can not be used for interpreting metric-learning CNNs because of no FC layer as the classifier. (B). The proposed SFAM can generate the explanation map for the metric learning model, which highlights the body of the two birds as the similar feature for decision-making.
  • Figure 2: The pipeline of the proposed SFAM. Image Q and Image S are fed into a metric learning CNN backbone for extraction of the feature maps. The feature maps are embedded into the feature vectors, which are utilized to calculate the proposed CIS. The proposed SFAMs for Image Q and Image S are generated by linearly combining the feature maps and the corresponding CIS.
  • Figure 3: The qualitative comparison between the proposed SFAM and $3$ baseline methods for FRN (ResNet12) that uses Euclidean distance as the similarity metric in $5$-way $1$-shot image classification task.
  • Figure 4: The qualitative comparison between the proposed SFAM and $4$ baseline methods for GPW (ResNet50) that uses cosine as the similarity metric in image retrieval task.
  • Figure 5: Sanity check for the proposed SFAM by cascading randomization from the $1$-st convolution layer to the $4$-th, the $7$-th, the $9$-th, and the $12$-th convolution layer in FRN (ResNet12) respectively. The original explanation map is generated by the propsoed SFAM for the well-trained FRN (ResNet12) without any parameters randomization.