Table of Contents
Fetching ...

Methods for Interpreting and Understanding Deep Neural Networks

Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller

TL;DR

This tutorial surveys post-hoc interpretability for deep neural networks, detailing prototype-based interpretations via Activation Maximization and decision explanations through sensitivity, Taylor decompositions, and especially Layer-wise Relevance Propagation (LRP). It unifies LRP with Deep Taylor Decomposition, offers practical rules and tricks (including handling pooling and normalization), and provides methods to assess explanation quality and apply explanations to model validation and scientific data analysis. The work emphasizes locality, robustness, and visualization strategies to yield meaningful, continuous, and selective explanations, with techniques like the translation trick and pixel-flipping to quantify usefulness. Overall, it presents a practical toolkit for making complex DNNs transparent and verifiable across diverse domains, without requiring changes to the underlying models.

Abstract

This paper provides an entry point to the problem of interpreting a deep neural network model and explaining its predictions. It is based on a tutorial given at ICASSP 2017. It introduces some recently proposed techniques of interpretation, along with theory, tricks and recommendations, to make most efficient use of these techniques on real data. It also discusses a number of practical applications.

Methods for Interpreting and Understanding Deep Neural Networks

TL;DR

This tutorial surveys post-hoc interpretability for deep neural networks, detailing prototype-based interpretations via Activation Maximization and decision explanations through sensitivity, Taylor decompositions, and especially Layer-wise Relevance Propagation (LRP). It unifies LRP with Deep Taylor Decomposition, offers practical rules and tricks (including handling pooling and normalization), and provides methods to assess explanation quality and apply explanations to model validation and scientific data analysis. The work emphasizes locality, robustness, and visualization strategies to yield meaningful, continuous, and selective explanations, with techniques like the translation trick and pixel-flipping to quantify usefulness. Overall, it presents a practical toolkit for making complex DNNs transparent and verifiable across diverse domains, without requiring changes to the underlying models.

Abstract

This paper provides an entry point to the problem of interpreting a deep neural network model and explaining its predictions. It is based on a tutorial given at ICASSP 2017. It introduces some recently proposed techniques of interpretation, along with theory, tricks and recommendations, to make most efficient use of these techniques on real data. It also discusses a number of practical applications.

Paper Structure

This paper contains 30 sections, 30 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Cartoon illustrating how the expert $p(\boldsymbol{x})$ affects the prototype $\boldsymbol{x}^\star$ found by AM. The horizontal axis represents the input space, and the vertical axis represents the probability.
  • Figure 2: Architectures supporting AM procedures and found prototypes. Black arrows indicate the forward path and red arrows indicate the reverse path for gradient computation.
  • Figure 3: Explanation of the DNN prediction "boat" for an image $\boldsymbol{x}$ given as input.
  • Figure 4: Diagram of the relevance neuron and its analysis. The root search domain is shown with a dashed line, and the relevance propagation resulting from decomposing $R_k$ is shown in red.
  • Figure 5: Highlighting in a large image pixels that are relevant for the CIFAR-10 class "horse", using the sliding window technique.
  • ...and 5 more figures

Theorems & Definitions (2)

  • Definition 1
  • Definition 2