Table of Contents
Fetching ...

Transforming gradient-based techniques into interpretable methods

Caroline Mazini Rodrigues, Nicolas Boutry, Laurent Najman

TL;DR

This work tackles noisy gradient-based explanations for CNNs by introducing Gradient Artificial Distancing (GAD), which emphasizes image regions that separate target classes without sacrificing model fidelity. GAD creates artificial shifts in final activations between chosen classes and trains support regression models to mimic these changes, then fuses multiple explanations to yield concise, region-focused attributions. Through occlusion-based evaluation and convex-hull analysis across cat-vs-dog and bird datasets, GAD demonstrates reduced explanation complexity and improved sensitivity for several gradient-based methods. The approach offers a practical path toward more interpretable xAI outputs and outlines future work on density-based metrics and semantic clustering for multi-class explanations.

Abstract

The explication of Convolutional Neural Networks (CNN) through xAI techniques often poses challenges in interpretation. The inherent complexity of input features, notably pixels extracted from images, engenders complex correlations. Gradient-based methodologies, exemplified by Integrated Gradients (IG), effectively demonstrate the significance of these features. Nevertheless, the conversion of these explanations into images frequently yields considerable noise. Presently, we introduce GAD (Gradient Artificial Distancing) as a supportive framework for gradient-based techniques. Its primary objective is to accentuate influential regions by establishing distinctions between classes. The essence of GAD is to limit the scope of analysis during visualization and, consequently reduce image noise. Empirical investigations involving occluded images have demonstrated that the identified regions through this methodology indeed play a pivotal role in facilitating class differentiation.

Transforming gradient-based techniques into interpretable methods

TL;DR

This work tackles noisy gradient-based explanations for CNNs by introducing Gradient Artificial Distancing (GAD), which emphasizes image regions that separate target classes without sacrificing model fidelity. GAD creates artificial shifts in final activations between chosen classes and trains support regression models to mimic these changes, then fuses multiple explanations to yield concise, region-focused attributions. Through occlusion-based evaluation and convex-hull analysis across cat-vs-dog and bird datasets, GAD demonstrates reduced explanation complexity and improved sensitivity for several gradient-based methods. The approach offers a practical path toward more interpretable xAI outputs and outlines future work on density-based metrics and semantic clustering for multi-class explanations.

Abstract

The explication of Convolutional Neural Networks (CNN) through xAI techniques often poses challenges in interpretation. The inherent complexity of input features, notably pixels extracted from images, engenders complex correlations. Gradient-based methodologies, exemplified by Integrated Gradients (IG), effectively demonstrate the significance of these features. Nevertheless, the conversion of these explanations into images frequently yields considerable noise. Presently, we introduce GAD (Gradient Artificial Distancing) as a supportive framework for gradient-based techniques. Its primary objective is to accentuate influential regions by establishing distinctions between classes. The essence of GAD is to limit the scope of analysis during visualization and, consequently reduce image noise. Empirical investigations involving occluded images have demonstrated that the identified regions through this methodology indeed play a pivotal role in facilitating class differentiation.
Paper Structure (17 sections, 2 equations, 7 figures, 2 tables, 2 algorithms)

This paper contains 17 sections, 2 equations, 7 figures, 2 tables, 2 algorithms.

Figures (7)

  • Figure 1: Based on the non-normalized probabilities (before Softmax) for both classes we distance samples from different classes. We define values $\alpha_{cat}$ and $\alpha_{dog}$ that will be subtracted from the probabilities. The value $\alpha_{cat}$ will reduce the probability of cat images being dogs (light pink column) and $\alpha_{dog}$ will reduce the probability of dog images being cats (light blue column). This new probabilities are artificial; however, they preserve relations between samples from the same class.
  • Figure 2: Attribution maps obtained by Integrated Gradients method. The pixels' importance is described from white to black (less to more important) according to a chosen class.
  • Figure 3: Example of choosing important features. After training regression networks, we apply the IG method for each resulting model and both classes. The final attribution map includes only features present in all five attribution maps, the initial one (classification model analyzed), and the four support regression models. We obtain filtered attribution maps, including the regions of the image that most separate classes.
  • Figure 4: Example of convex hull applied to an attribution map. We involve with the convex-hull the pixels reaching more than $10\%$ of the highest importance according to the GAD -- Guided Propagation attribution map.
  • Figure 5: Our final visualizations:$\mathrm{G AD}$ improves interpretability of attribution maps. We present two-by-two rows of original attribution maps and $\mathrm{G AD}$ maps from five gradient-based techniques: Saliency, Deconvolution, Gradient x Input, Guided-Backpropagation, and Integrated Gradients. We present on the right side the VGG results and on the left side the ResNet ones. At the bottom, we present the obtained classification for each image (cat or dog).
  • ...and 2 more figures