Table of Contents
Fetching ...

Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers

Alexander Binder, Grégoire Montavon, Sebastian Bach, Klaus-Robert Müller, Wojciech Samek

TL;DR

This work addresses the limitation of Layer-wise Relevance Propagation (LRP) in explaining neural networks with product-type nonlinearities, specifically local renormalization layers. It introduces a first-order Taylor expansion-based weighting scheme to redistribute relevance for local renormalization, deriving neuron input weights $v_{ij}$ with $\sum_i v_{ij}=1$ and integrating with the existing $\epsilon$- and $\beta$-relevance rules. Empirical results on CIFAR-10, ImageNet, and MIT Places show that the Taylor-based approach yields more meaningful heatmaps (measured by AUC in a perturbation-based evaluation) than identity-based handling, with best performance for $\epsilon$ values around $1$ or $0.01$ and with Taylor treatment of the normalization layer. The findings extend LRP applicability to broader nonlinearities, enhancing explainability of CNNs in practical vision tasks and motivating exploration of higher-order expansions and other nonlinear layers.

Abstract

Layer-wise relevance propagation is a framework which allows to decompose the prediction of a deep neural network computed over a sample, e.g. an image, down to relevance scores for the single input dimensions of the sample such as subpixels of an image. While this approach can be applied directly to generalized linear mappings, product type non-linearities are not covered. This paper proposes an approach to extend layer-wise relevance propagation to neural networks with local renormalization layers, which is a very common product-type non-linearity in convolutional neural networks. We evaluate the proposed method for local renormalization layers on the CIFAR-10, Imagenet and MIT Places datasets.

Layer-wise Relevance Propagation for Neural Networks with Local Renormalization Layers

TL;DR

This work addresses the limitation of Layer-wise Relevance Propagation (LRP) in explaining neural networks with product-type nonlinearities, specifically local renormalization layers. It introduces a first-order Taylor expansion-based weighting scheme to redistribute relevance for local renormalization, deriving neuron input weights with and integrating with the existing - and -relevance rules. Empirical results on CIFAR-10, ImageNet, and MIT Places show that the Taylor-based approach yields more meaningful heatmaps (measured by AUC in a perturbation-based evaluation) than identity-based handling, with best performance for values around or and with Taylor treatment of the normalization layer. The findings extend LRP applicability to broader nonlinearities, enhancing explainability of CNNs in practical vision tasks and motivating exploration of higher-order expansions and other nonlinear layers.

Abstract

Layer-wise relevance propagation is a framework which allows to decompose the prediction of a deep neural network computed over a sample, e.g. an image, down to relevance scores for the single input dimensions of the sample such as subpixels of an image. While this approach can be applied directly to generalized linear mappings, product type non-linearities are not covered. This paper proposes an approach to extend layer-wise relevance propagation to neural networks with local renormalization layers, which is a very common product-type non-linearity in convolutional neural networks. We evaluate the proposed method for local renormalization layers on the CIFAR-10, Imagenet and MIT Places datasets.

Paper Structure

This paper contains 5 sections, 13 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Pixel-wise decompositions for classes wolf, frog and wolf using a neural network pretrained for the 1000 classes of the ILSVRC challenge.
  • Figure 2: Decrease of classification score as pixels are sequentially replaced by random noise on the CIFAR-10 dataset. Red curve: pixels with highest pixel-wise scores are flipped first. Blue curve: pixels are flipped in random order. Green curve: least relevant pixels are flipped first. A similar comparison for Imagenet is found in DBLP:journals/corr/SamekBMBM15.
  • Figure 3: Top row shows original unwarped image. Remaining rows show heatmaps produced by various parameters of the LRP method.