Transforming gradient-based techniques into interpretable methods
Caroline Mazini Rodrigues, Nicolas Boutry, Laurent Najman
TL;DR
This work tackles noisy gradient-based explanations for CNNs by introducing Gradient Artificial Distancing (GAD), which emphasizes image regions that separate target classes without sacrificing model fidelity. GAD creates artificial shifts in final activations between chosen classes and trains support regression models to mimic these changes, then fuses multiple explanations to yield concise, region-focused attributions. Through occlusion-based evaluation and convex-hull analysis across cat-vs-dog and bird datasets, GAD demonstrates reduced explanation complexity and improved sensitivity for several gradient-based methods. The approach offers a practical path toward more interpretable xAI outputs and outlines future work on density-based metrics and semantic clustering for multi-class explanations.
Abstract
The explication of Convolutional Neural Networks (CNN) through xAI techniques often poses challenges in interpretation. The inherent complexity of input features, notably pixels extracted from images, engenders complex correlations. Gradient-based methodologies, exemplified by Integrated Gradients (IG), effectively demonstrate the significance of these features. Nevertheless, the conversion of these explanations into images frequently yields considerable noise. Presently, we introduce GAD (Gradient Artificial Distancing) as a supportive framework for gradient-based techniques. Its primary objective is to accentuate influential regions by establishing distinctions between classes. The essence of GAD is to limit the scope of analysis during visualization and, consequently reduce image noise. Empirical investigations involving occluded images have demonstrated that the identified regions through this methodology indeed play a pivotal role in facilitating class differentiation.
