Methods for Interpreting and Understanding Deep Neural Networks
Grégoire Montavon, Wojciech Samek, Klaus-Robert Müller
TL;DR
This tutorial surveys post-hoc interpretability for deep neural networks, detailing prototype-based interpretations via Activation Maximization and decision explanations through sensitivity, Taylor decompositions, and especially Layer-wise Relevance Propagation (LRP). It unifies LRP with Deep Taylor Decomposition, offers practical rules and tricks (including handling pooling and normalization), and provides methods to assess explanation quality and apply explanations to model validation and scientific data analysis. The work emphasizes locality, robustness, and visualization strategies to yield meaningful, continuous, and selective explanations, with techniques like the translation trick and pixel-flipping to quantify usefulness. Overall, it presents a practical toolkit for making complex DNNs transparent and verifiable across diverse domains, without requiring changes to the underlying models.
Abstract
This paper provides an entry point to the problem of interpreting a deep neural network model and explaining its predictions. It is based on a tutorial given at ICASSP 2017. It introduces some recently proposed techniques of interpretation, along with theory, tricks and recommendations, to make most efficient use of these techniques on real data. It also discusses a number of practical applications.
