IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution

Yue Zhuo; Zhiqiang Ge

IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution

Yue Zhuo, Zhiqiang Ge

TL;DR

IG2 addresses the noise and baseline arbitrariness of IG by integrating two gradients along a novel GradPath and using a counterfactual-based GradCF baseline. It multiplies explicand-gradient with counterfactual-gradient information via a representation-distance objective, mitigating saturation and capturing more complete contrasts. The authors prove that IG2 satisfies key path-attribution axioms and validate its superiority across diverse domains (synthetic data, ImageNet, MNIST, TREC, wafer maps, CelebA) with extensive ablations. This approach yields more faithful and human-aligned explanations, with practical viability demonstrated despite higher computational cost, and code is released for public use.

Abstract

Feature attribution explains Artificial Intelligence (AI) at the instance level by providing importance scores of input features' contributions to model prediction. Integrated Gradients (IG) is a prominent path attribution method for deep neural networks, involving the integration of gradients along a path from the explained input (explicand) to a counterfactual instance (baseline). Current IG variants primarily focus on the gradient of explicand's output. However, our research indicates that the gradient of the counterfactual output significantly affects feature attribution as well. To achieve this, we propose Iterative Gradient path Integrated Gradients (IG2), considering both gradients. IG2 incorporates the counterfactual gradient iteratively into the integration path, generating a novel path (GradPath) and a novel baseline (GradCF). These two novel IG components effectively address the issues of attribution noise and arbitrary baseline choice in earlier IG methods. IG2, as a path method, satisfies many desirable axioms, which are theoretically justified in the paper. Experimental results on XAI benchmark, ImageNet, MNIST, TREC questions answering, wafer-map failure patterns, and CelebA face attributes validate that IG2 delivers superior feature attributions compared to the state-of-the-art techniques. The code is released at: https://github.com/JoeZhuo-ZY/IG2.

IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution

TL;DR

Abstract

Paper Structure (56 sections, 1 theorem, 20 equations, 22 figures, 12 tables, 1 algorithm)

This paper contains 56 sections, 1 theorem, 20 equations, 22 figures, 12 tables, 1 algorithm.

Introduction
Preliminary of path methods
Methodologies of IG2
Building GradPath
Integrating gradients on GradPath
Interpreting IG2
GradPath: mitigating saturation effects
GradCF
Axioms of IG2
Related works
Feature attribution
Path methods
Shapley values
Adversarially counterfactual explanation
Experiments
...and 41 more sections

Key Result

Proposition 1

(Guided IG guided) If the values of symmetric variables are equal at every point of the integration path, then their attributions are equal. Therefore, such a path attribution method is symmetry preserving.

Figures (22)

Figure 1: Illustration of IG2 with GradPath and GradCF, compared with Vanilla IG (with a zero baseline) and Guided IG (counterfactual baseline). The explicand is a sample [Doberman] from ImageNet ILSVRC15 classified by Inception-v3 Inceptionv3. The graphs of logit predictions for [Doberman] w.r.t. $\alpha$ value are plotted. The attributions snapshots of each method are shown at $\alpha$ values of 0.02, 0.5 and 1.0. At $\alpha=1.0$, the attributions are decomposed into two multipliers: explicand's gradient (small images in red boxes) and integration path direction (in blue boxes). Integration path of IG2 optimally aligns the explicand's gradient with the counterfactual gradient. This alignment results in feature attributions that are less noisy and more complete (on the body of Doberman). Comprehensive attribution results on ImageNet can be found in Section \ref{['sec_image_exp']}.
Figure 2: Differences between explicand and three baselines: the black image, counterfactuals and GradCF. Three samples from ImageNet, MNIST 6932449 and wafer map 6932449 datasets are plotted. For the non-black baselines of counter class, the ImageNet sample is contrasted with randomly sampled images, MNIST's digital 5 is contrasted with digitals 6, and the central failure wafer map is contrasted with normal wafer maps. The critical features in explicands are accurately highlighted by contrasting with GradCF. Detailed discussions of GradCF on MNIST examples are in Section \ref{['interp_gradcf']}.
Figure 3: Illustration for building GradPath at each step. From the explicand, the direction of GradPath is iteratively built on the gradient direction for minimizing the model representation distance to the reference.
Figure 4: (a) Illustration for three counterfactual examples and three integration paths: CF1 sampled from counterfactual data distribution, CF2 generated by an adversarial attack and the GradCF using CF1 as the reference. The saturated area on straight-line path is marked in red. An MNIST explicand (digital 5) and three CFs (digital 6) are plotted. (b) Graphs of explicand's Softmax prediction along integration paths on ImageNet, MNIST, and TREC, averaged on 100 samples of each dataset.
Figure 5: The toy example for feature attribution with function $y=\max(x_1,x_2-1)$ on explicand $(x_1,x_2)=(3,3)$ with the zero reference $(0,0)$. (a) GradPath on counterfactual gradients in blue line. (b) The attribution results of three methods in the legend.
...and 17 more figures

Theorems & Definitions (14)

Definition 1
Definition 2
Definition 3
Definition 4
Remark 1
Definition 5
Remark 2
proof
Definition 6
Remark 3
...and 4 more

IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution

TL;DR

Abstract

IG2: Integrated Gradient on Iterative Gradient Path for Feature Attribution

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (22)

Theorems & Definitions (14)