Investigating the influence of noise and distractors on the interpretation of neural networks
Pieter-Jan Kindermans, Kristof Schütt, Klaus-Robert Müller, Sven Dähne
TL;DR
The paper investigates how noise and distractors affect neural-network explanations and argues that existing gradient-based methods may be unreliable in noisy settings. It formalizes a generative model $x = \boldsymbol{a}_{t}s_{t} + A_{n}\boldsymbol{s}_{n}^{T} + \boldsymbol{\epsilon}$ and shows that any linear explanation must satisfy $\boldsymbol{w}^{T}\boldsymbol{a}_{t}=1$ and $\boldsymbol{w}^{T}A_{n}=0$ to recover $s_{t}$, highlighting the importance of ignoring task-irrelevant directions. Through a deep Taylor decomposition lens, it analyzes root-point choices and existing rules (e.g., the $z$-rule, $w^2$-rule, $a$-rule) and proposes two new robust rules ($w^+$ and $a^+$) that align explanations with task-related variation. Empirical results on MNIST with an MLP show how different rules allocate relevance under noise, underscoring the need for principled rule selection and broader benchmarks for explanation methods.
Abstract
Understanding neural networks is becoming increasingly important. Over the last few years different types of visualisation and explanation methods have been proposed. However, none of them explicitly considered the behaviour in the presence of noise and distracting elements. In this work, we will show how noise and distracting dimensions can influence the result of an explanation model. This gives a new theoretical insights to aid selection of the most appropriate explanation model within the deep-Taylor decomposition framework.
