Table of Contents
Fetching ...

Not Just a Black Box: Learning Important Features Through Propagating Activation Differences

Avanti Shrikumar, Peyton Greenside, Anna Shcherbina, Anshul Kundaje

TL;DR

DeepLIFT introduces a forward-reference-based attribution method that computes feature contributions by tracking activation differences from a reference, addressing zero-gradient issues common to ReLUs and saturating activations. It formalizes backpropagation-like rules with summation-to-delta and linear-composition principles and defines multipliers for diverse operations, including affine, max, and maxout units, plus techniques for Softmax and constrained inputs. The method is demonstrated on Tiny ImageNet and a genomics CNN, showing superior saliency and motif attribution compared with gradient-based approaches and aligning with Layer-wise Relevance Propagation under certain conditions. Overall, DeepLIFT provides a scalable, stable framework for interpretable feature attribution in neural networks across vision and sequence domains.

Abstract

Note: This paper describes an older version of DeepLIFT. See https://arxiv.org/abs/1704.02685 for the newer version. Original abstract follows: The purported "black box" nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Learning Important FeaTures), an efficient and effective method for computing importance scores in a neural network. DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. We apply DeepLIFT to models trained on natural images and genomic data, and show significant advantages over gradient-based methods.

Not Just a Black Box: Learning Important Features Through Propagating Activation Differences

TL;DR

DeepLIFT introduces a forward-reference-based attribution method that computes feature contributions by tracking activation differences from a reference, addressing zero-gradient issues common to ReLUs and saturating activations. It formalizes backpropagation-like rules with summation-to-delta and linear-composition principles and defines multipliers for diverse operations, including affine, max, and maxout units, plus techniques for Softmax and constrained inputs. The method is demonstrated on Tiny ImageNet and a genomics CNN, showing superior saliency and motif attribution compared with gradient-based approaches and aligning with Layer-wise Relevance Propagation under certain conditions. Overall, DeepLIFT provides a scalable, stable framework for interpretable feature attribution in neural networks across vision and sequence domains.

Abstract

Note: This paper describes an older version of DeepLIFT. See https://arxiv.org/abs/1704.02685 for the newer version. Original abstract follows: The purported "black box" nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Learning Important FeaTures), an efficient and effective method for computing importance scores in a neural network. DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. We apply DeepLIFT to models trained on natural images and genomic data, and show significant advantages over gradient-based methods.

Paper Structure

This paper contains 19 sections, 27 equations, 3 figures.

Figures (3)

  • Figure 1: Simple network with inputs $x_1$ and $x_2$ that have reference values of 0. When $x_1 = x_2 = -1$, output is 0.1 but the gradients w.r.t $x_1$ and $x_2$ are 0 due to inactive ReLU $y$ (which has activation of $2$ under reference input). By comparing activations to their reference values, DeepLIFT assigns contributions to the output of $\left((0.1-0.5)\frac{1}{3}\right)$ to $x_1$ and $\left((0.1-0.5)\frac{2}{3}\right)$ to $x_2$.
  • Figure 2: Comparison of methods. Importance scores for RGB channels were summed to get per-pixel importance. Left-to-right: original image, absolute value of the gradient (similar to Simonyan et al. which used the two-norm across RGB rather than the sum, and which is related to both Zeiler et al. and Springenberg et al.), positive gradient*input (Taylor approximation, equivalent to Layer-wise Relevance Propagation in Samek et al. but masking negative contributions), and positive DeepLIFT.
  • Figure 3: DeepLIFT scores (top) and gradient*input (bottom) are plotted for each position in the DNA sequence and colored by the DNA base (due to one-hot encoding, input is either 1 or 0; gradient*input is equivalent to taking the gradient for the letter that is actually present). DeepLIFT discovers both patterns and assigns them large importance scores. Gradient-based methods miss the GATA pattern.