Not Just a Black Box: Learning Important Features Through Propagating Activation Differences
Avanti Shrikumar, Peyton Greenside, Anna Shcherbina, Anshul Kundaje
TL;DR
DeepLIFT introduces a forward-reference-based attribution method that computes feature contributions by tracking activation differences from a reference, addressing zero-gradient issues common to ReLUs and saturating activations. It formalizes backpropagation-like rules with summation-to-delta and linear-composition principles and defines multipliers for diverse operations, including affine, max, and maxout units, plus techniques for Softmax and constrained inputs. The method is demonstrated on Tiny ImageNet and a genomics CNN, showing superior saliency and motif attribution compared with gradient-based approaches and aligning with Layer-wise Relevance Propagation under certain conditions. Overall, DeepLIFT provides a scalable, stable framework for interpretable feature attribution in neural networks across vision and sequence domains.
Abstract
Note: This paper describes an older version of DeepLIFT. See https://arxiv.org/abs/1704.02685 for the newer version. Original abstract follows: The purported "black box" nature of neural networks is a barrier to adoption in applications where interpretability is essential. Here we present DeepLIFT (Learning Important FeaTures), an efficient and effective method for computing importance scores in a neural network. DeepLIFT compares the activation of each neuron to its 'reference activation' and assigns contribution scores according to the difference. We apply DeepLIFT to models trained on natural images and genomic data, and show significant advantages over gradient-based methods.
