Table of Contents
Fetching ...

When Explanations Lie: Why Many Modified BP Attributions Fail

Leon Sixt, Maximilian Granz, Tim Landgraf

TL;DR

The paper provides a framework to assess the faithfulness of new and existing modified BP methods theoretically and empirically, and measures how information of later layers is ignored by using the new metric, cosine similarity convergence (CSC).

Abstract

Attribution methods aim to explain a neural network's prediction by highlighting the most relevant image areas. A popular approach is to backpropagate (BP) a custom relevance score using modified rules, rather than the gradient. We analyze an extensive set of modified BP methods: Deep Taylor Decomposition, Layer-wise Relevance Propagation (LRP), Excitation BP, PatternAttribution, DeepLIFT, Deconv, RectGrad, and Guided BP. We find empirically that the explanations of all mentioned methods, except for DeepLIFT, are independent of the parameters of later layers. We provide theoretical insights for this surprising behavior and also analyze why DeepLIFT does not suffer from this limitation. Empirically, we measure how information of later layers is ignored by using our new metric, cosine similarity convergence (CSC). The paper provides a framework to assess the faithfulness of new and existing modified BP methods theoretically and empirically. For code see: https://github.com/berleon/when-explanations-lie

When Explanations Lie: Why Many Modified BP Attributions Fail

TL;DR

The paper provides a framework to assess the faithfulness of new and existing modified BP methods theoretically and empirically, and measures how information of later layers is ignored by using the new metric, cosine similarity convergence (CSC).

Abstract

Attribution methods aim to explain a neural network's prediction by highlighting the most relevant image areas. A popular approach is to backpropagate (BP) a custom relevance score using modified rules, rather than the gradient. We analyze an extensive set of modified BP methods: Deep Taylor Decomposition, Layer-wise Relevance Propagation (LRP), Excitation BP, PatternAttribution, DeepLIFT, Deconv, RectGrad, and Guided BP. We find empirically that the explanations of all mentioned methods, except for DeepLIFT, are independent of the parameters of later layers. We provide theoretical insights for this surprising behavior and also analyze why DeepLIFT does not suffer from this limitation. Empirically, we measure how information of later layers is ignored by using our new metric, cosine similarity convergence (CSC). The paper provides a framework to assess the faithfulness of new and existing modified BP methods theoretically and empirically. For code see: https://github.com/berleon/when-explanations-lie

Paper Structure

This paper contains 46 sections, 1 theorem, 38 equations, 10 figures.

Key Result

Theorem 1

Let $A_1, A_2, A_3\dots$ be a sequence of non-negative matrices. We require that every column vector ${\bm{a}}$ of $A_n$ has a norm $|| {\bm{a}} || \ge \epsilon_0$ and that infinite many matrices $A_i$ with $i \in I$ and $|I| = |\mathbb{N}|$ exists for which two column vectors have a dot product of

Figures (10)

  • Figure 1: (a) Sanity Checks: Saliency maps should change if network parameters are randomized. Parameters are randomized from the last to the first layer. Red denotes positive and blue negative relevance. (b-e) Class insensitivity of LRP$_{\alpha{1}\beta{0}}$ on VGG-16. Explanation for (c)Persian cat (283) and (d)King Charles Spaniel (156). (e) Difference (c) - (d), both normalized to $[0, 1]$. L1-norm of (e) = 0.000371.
  • Figure 2: The positive column vectors ${\bm{a}}_1, {\bm{a}}_2$ of matrix $A_1$ (orange) form a cone. The resulting columns of $A_1 A_2$ (green) are contained in the cone as they are positive linear combinations of ${\bm{a}}_1, {\bm{a}}_2$. At each iteration, the cone shrinks.
  • Figure 3: PatternNet & PatternAttr.: (a)(b) Ratio between the first and second singular value $\sigma_1 / \sigma_2$ for $A_l, W_l,$ and $A_l\odot W_l$. (c)$\sigma_1 / \sigma_2$ of inter-layer derivation matrices. For (b) (c), we sliced the 3x3 convolutional kernels to 1x1 kernels.
  • Figure 4: (a) SSIM between saliency maps explaining the ground-truth or a random logit. (b) The parameters of the VGG-16 are randomized, starting from the last to the first layer. SSIM quantifies the difference to the saliency map from the original model. Intervals show 99% bootstrap confidences.
  • Figure 5: (a)-(d) Median of the cosine similarity convergence (CSC) per layer between relevance vectors obtained from randomizing the relevance vectors of the final layer. (e)-(g) histogram of the distribution of the CSC after the first layer.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Theorem 1