Table of Contents
Fetching ...

Metric-Guided Synthesis of Class Activation Mapping

Alejandro Luque-Cerpa, Elizabeth Polgreen, Ajitha Rajan, Hazem Torfah

TL;DR

SyCAM presents a metric-guided framework for automatically synthesizing CAM expressions that explain CNN decisions. By framing CAM weight computation as a synthesizable expression within a syntax-guided grammar and using OGIS with equivalence and correctness oracles, it tailors heatmaps to specific evaluation metrics, including ground-truth and robustness measures. The approach is instantiated with a gradient- and Score/Ablation-based CAM grammar and evaluated on ResNet50, VGG16, and VGG19 across multiple datasets, outperforming common CAM methods on selected metrics. This methodology enables domain- and metric-aware saliency maps, potentially improving interpretability and trust in CNN explanations while accommodating expert knowledge through targeted metrics.

Abstract

Class activation mapping (CAM) is a widely adopted class of saliency methods used to explain the behavior of convolutional neural networks (CNNs). These methods generate heatmaps that highlight the parts of the input most relevant to the CNN output. Various CAM methods have been proposed, each distinguished by the expressions used to derive heatmaps. In general, users look for heatmaps with specific properties that reflect different aspects of CNN functionality. These may include similarity to ground truth, robustness, equivariance, and more. Although existing CAM methods implicitly encode some of these properties in their expressions, they do not allow for variability in heatmap generation following the user's intent or domain knowledge. In this paper, we address this limitation by introducing SyCAM, a metric-based approach for synthesizing CAM expressions. Given a predefined evaluation metric for saliency maps, SyCAM automatically generates CAM expressions optimized for that metric. We specifically explore a syntax-guided synthesis instantiation of SyCAM, where CAM expressions are derived based on predefined syntactic constraints and the given metric. Using several established evaluation metrics, we demonstrate the efficacy and flexibility of our approach in generating targeted heatmaps. We compare SyCAM with other well-known CAM methods on three prominent models: ResNet50, VGG16, and VGG19.

Metric-Guided Synthesis of Class Activation Mapping

TL;DR

SyCAM presents a metric-guided framework for automatically synthesizing CAM expressions that explain CNN decisions. By framing CAM weight computation as a synthesizable expression within a syntax-guided grammar and using OGIS with equivalence and correctness oracles, it tailors heatmaps to specific evaluation metrics, including ground-truth and robustness measures. The approach is instantiated with a gradient- and Score/Ablation-based CAM grammar and evaluated on ResNet50, VGG16, and VGG19 across multiple datasets, outperforming common CAM methods on selected metrics. This methodology enables domain- and metric-aware saliency maps, potentially improving interpretability and trust in CNN explanations while accommodating expert knowledge through targeted metrics.

Abstract

Class activation mapping (CAM) is a widely adopted class of saliency methods used to explain the behavior of convolutional neural networks (CNNs). These methods generate heatmaps that highlight the parts of the input most relevant to the CNN output. Various CAM methods have been proposed, each distinguished by the expressions used to derive heatmaps. In general, users look for heatmaps with specific properties that reflect different aspects of CNN functionality. These may include similarity to ground truth, robustness, equivariance, and more. Although existing CAM methods implicitly encode some of these properties in their expressions, they do not allow for variability in heatmap generation following the user's intent or domain knowledge. In this paper, we address this limitation by introducing SyCAM, a metric-based approach for synthesizing CAM expressions. Given a predefined evaluation metric for saliency maps, SyCAM automatically generates CAM expressions optimized for that metric. We specifically explore a syntax-guided synthesis instantiation of SyCAM, where CAM expressions are derived based on predefined syntactic constraints and the given metric. Using several established evaluation metrics, we demonstrate the efficacy and flexibility of our approach in generating targeted heatmaps. We compare SyCAM with other well-known CAM methods on three prominent models: ResNet50, VGG16, and VGG19.

Paper Structure

This paper contains 29 sections, 6 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Saliency maps generated using different CAM methods (GradCAM GradCAM, GradCAM++ GradCAM++, ScoreCAM scoreCAM, AblationCAM ablationcam, and one using an expression synthesized by our SyCAM framework) for three models different CNNs trained on three different data sets ImageNet imagenet, COVID-QU-Ex covid_dataset and ImageNette imagette. The first row of images shows heatmaps for GradCAM and GradCAM++. The second row of images shows how SyCAM guided by a ground truth metric, captures the ground truth more accurately than the other methods. The last row shows how SyCAM guided by the insertion metric generates a heatmap that closely mimics that of the dominant CAM method, ScoreCAM in this case.
  • Figure 2: Overview of a CNN-based model that classifies X-ray images into COVID-19 positive or negative, and a CAM-based method that explains each classification.
  • Figure 3: SyCAM application to a VGG16 model trained over the PASCAL VOC 2007 dataset and the Average Drop % metric (lower is better). If only gradients-related terminals are included in the grammar, SyCAM synthesizes GradCAM++. Better CAM expressions are synthesized for an expanded grammar.
  • Figure 4: Saliency maps generated by GradCAM, GradCAM++, ScoreCAM, AblationCAM, and the SyCAM expression for ResNet50, the class "2. English springer", and the Deletion metric ($P=30$, higher is better). The scores for each method are 0.2141, 0.2414, 0.2419, 0.2343, and 0.2441, respectively. SyCAM gets a better score and a saliency map that does not highlight the right dog as much as GradCAM and GradCAM++ do and highlights the body of the left dog more than ScoreCAM and AblationCAM.
  • Figure 5: Saliency maps generated by GradCAM, GradCAM++, ScoreCAM, AblationCAM, and the SyCAM expression synthesized for ResNet50.

Theorems & Definitions (3)

  • definition thmcounterdefinition: Context-Free Grammar
  • definition thmcounterdefinition: SyGuS problems
  • definition thmcounterdefinition: Observational Equivalence