Table of Contents
Fetching ...

Learning Important Features Through Propagating Activation Differences

Avanti Shrikumar, Peyton Greenside, Anshul Kundaje

TL;DR

DeepLIFT introduces a reference-based attribution framework that propagates feature importance through neural networks by backpropagating differences from a chosen reference input. By defining multipliers and a chain rule, it enables efficient, forward-compatible attributions without relying solely on gradients, and it separates positive and negative contributions to reveal interactions that gradient-based methods miss. The RevealCancel rule further refines attributions by approximating Shapley values and mitigating cancellation artifacts. Empirical results on MNIST and simulated genomic data demonstrate that DeepLIFT, especially with RevealCancel, provides more accurate and robust feature importance than gradient-based approaches, with practical implications for interpretability in vision and genomics tasks.

Abstract

The purported "black box" nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. By optionally giving separate consideration to positive and negative contributions, DeepLIFT can also reveal dependencies which are missed by other approaches. Scores can be computed efficiently in a single backward pass. We apply DeepLIFT to models trained on MNIST and simulated genomic data, and show significant advantages over gradient-based methods. Video tutorial: http://goo.gl/qKb7pL, ICML slides: bit.ly/deeplifticmlslides, ICML talk: https://vimeo.com/238275076, code: http://goo.gl/RM8jvH.

Learning Important Features Through Propagating Activation Differences

TL;DR

DeepLIFT introduces a reference-based attribution framework that propagates feature importance through neural networks by backpropagating differences from a chosen reference input. By defining multipliers and a chain rule, it enables efficient, forward-compatible attributions without relying solely on gradients, and it separates positive and negative contributions to reveal interactions that gradient-based methods miss. The RevealCancel rule further refines attributions by approximating Shapley values and mitigating cancellation artifacts. Empirical results on MNIST and simulated genomic data demonstrate that DeepLIFT, especially with RevealCancel, provides more accurate and robust feature importance than gradient-based approaches, with practical implications for interpretability in vision and genomics tasks.

Abstract

The purported "black box" nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Deep Learning Important FeaTures), a method for decomposing the output prediction of a neural network on a specific input by backpropagating the contributions of all neurons in the network to every feature of the input. DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. By optionally giving separate consideration to positive and negative contributions, DeepLIFT can also reveal dependencies which are missed by other approaches. Scores can be computed efficiently in a single backward pass. We apply DeepLIFT to models trained on MNIST and simulated genomic data, and show significant advantages over gradient-based methods. Video tutorial: http://goo.gl/qKb7pL, ICML slides: bit.ly/deeplifticmlslides, ICML talk: https://vimeo.com/238275076, code: http://goo.gl/RM8jvH.

Paper Structure

This paper contains 28 sections, 10 equations, 6 figures.

Figures (6)

  • Figure 1: Perturbation-based approaches and gradient-based approaches fail to model saturation. Illustrated is a simple network exhibiting saturation in the signal from its inputs. At the point where $i_1 = 1$ and $i_2=1$, perturbing either $i_1$ or $i_2$ to 0 will not produce a change in the output. Note that the gradient of the output w.r.t the inputs is also zero when $i_1 + i_2 > 1$.
  • Figure 2: Discontinuous gradients can produce misleading importance scores. Response of a single rectified linear unit with a bias of $-10$. Both gradient and gradient$\times$input have a discontinuity at $x=10$; at $x=10+\epsilon$, gradient$\times$input assigns a contribution of $10+\epsilon$ to $x$ and $-10$ to the bias term ($\epsilon$ is a small positive number). When $x < 10$, contributions on $x$ and the bias term are both $0$. By contrast, the difference-from-reference (red arrow, top figure) gives a continuous increase in the contribution score.
  • Figure 3: Network computing $o = \min(i_1, i_2)$. Assume $i_1^0 = i_2^0 = 0$. When $i_1 < i_2$ then $\frac{dy}{di_2}=0$, and when $i_2 < i_1$ then $\frac{do}{di_1} = 0$. Using any of the backpropagation approaches described in Section 2.2 would result in importance assigned either exclusively to $i_1$ or $i_2$. With the RevealCancel rule, the net assigns $0.5\min(i_1, i_2)$ importance to both inputs.
  • Figure 4: DeepLIFT with the RevealCancel rule better identifies pixels to convert one digit to another. Top: result of masking pixels ranked as most important for the original class (8) relative to the target class (3 or 6). Importance scores for class 8, 3 and 6 are also shown. The selected image had the highest change in log-odds scores for the 8$\rightarrow$6 conversion using gradient*input or integrated gradients to rank pixels. Bottom: boxplots of increase in log-odds scores of target vs. original class after the mask is applied, for 1K images belonging to the original class in the testing set. "Integrated gradients-n" refers to numerically integrating the gradients over $n$ evenly-spaced intervals using the midpoint rule.
  • Figure 5: DeepLIFT with RevealCancel gives qualitatively desirable behavior on TAL-GATA simulation. (a) Scatter plots of importance score vs. strength of TAL1 motif match for different tasks and methods (see Appendix G for GATA1). For each region, top 5 motif matches are plotted. X-axes: log-odds of TAL1 motif match vs. background. Y-axes: total importance assigned to the match for specified task. Red dots are from regions where both TAL1 and GATA1 motifs were inserted during simulation; blue have GATA1 only, green have TAL1 only, black have no motifs inserted. "DeepLIFT-fc-RC-conv-RS" refers to using RevealCancel on the fully-connected layer and Rescale on the convolutional layers, which appears to reduce noise relative to using RevealCancel on all layers. (b) proportion of strong matches (log-odds $>$ 7) to TAL1 motif in regions containing both TAL1 and GATA1 that had total score $\le$ 0 for task 0; Guided Backprop$\times$inp and DeepLIFT with RevealCancel have no false negatives, but Guided Backprop has false positives for Task 1 (Panel (a))
  • ...and 1 more figures