Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

Pattarawat Chormai; Jan Herrmann; Klaus-Robert Müller; Grégoire Montavon

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

Pattarawat Chormai, Jan Herrmann, Klaus-Robert Müller, Grégoire Montavon

TL;DR

This paper proposes two new analyses, extending principles found in PCA or ICA to explanations, which maximize relevance instead of e.g. variance or kurtosis, and allows for a much stronger focus of the analysis on what the ML model actually uses for predicting.

Abstract

Explainable AI aims to overcome the black-box nature of complex ML models like neural networks by generating explanations for their predictions. Explanations often take the form of a heatmap identifying input features (e.g. pixels) that are relevant to the model's decision. These explanations, however, entangle the potentially multiple factors that enter into the overall complex decision strategy. We propose to disentangle explanations by extracting at some intermediate layer of a neural network, subspaces that capture the multiple and distinct activation patterns (e.g. visual concepts) that are relevant to the prediction. To automatically extract these subspaces, we propose two new analyses, extending principles found in PCA or ICA to explanations. These novel analyses, which we call principal relevant component analysis (PRCA) and disentangled relevant subspace analysis (DRSA), maximize relevance instead of e.g. variance or kurtosis. This allows for a much stronger focus of the analysis on what the ML model actually uses for predicting, ignoring activations or concepts to which the model is invariant. Our approach is general enough to work alongside common attribution techniques such as Shapley Value, Integrated Gradients, or LRP. Our proposed methods show to be practically useful and compare favorably to the state of the art as demonstrated on benchmarks and three use cases.

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

TL;DR

Abstract

Paper Structure (54 sections, 3 theorems, 63 equations, 16 figures, 5 tables, 1 algorithm)

This paper contains 54 sections, 3 theorems, 63 equations, 16 figures, 5 tables, 1 algorithm.

Overview of Attribution Techniques
Shapley Value
Integrated Gradients
Layer-wise Relevance Propagation
Disentangled Explanations with Various Attribution Techniques
Deriving Two-Step Attributions
Application to Gradient $\times\,$Input
Application to Integrated Gradients
Application to Shapley Values
Verifying the Structure of $R_k$
Structure of $R_k$'s with Gradient $\times\,$Input
Structure of $R_k$'s with Integrated Gradients
Structure of $R_k$'s with Shapley Value
Analytical Calculation of Total Relevance
Proofs and Derivations
...and 39 more sections

Key Result

Proposition 1

Let $\boldsymbol{U} = (U_k)_k$ be an orthogonal matrix formed by $U_k$'s. Using the formulation of relevance $R_k = (U_k^\top \boldsymbol{a})^\top(U_k^\top \boldsymbol{c})$ with $\boldsymbol{c}$ such that $R_j = a_j^\prime c_j$, we have the conservation property $\sum_k R_k = \sum_j R_j$. Furthermor

Figures (16)

Figure D.1: Patch-flipping curves from different attribution methods and models. We average the scores from $5000$ random validation images from the ImageNet dataset imagenet_cvpr09 for all methods, except Shapley Value Sampling, where we use only 10% of these images for computational reasons. We perform patch-flipping over patches of size $16\times16$.
Figure D.2: LRP-$\gamma_{}$ heatmaps of different input images for NFNet-F0 and with different values of $\gamma$.
Figure E.1: Training curves of the DSA and DRSA optimization across different models and attribution methods. In each plot, each curve corresponds to one of the 50 ImageNet classes used in the main experiments (see Section 5 of the main paper).
Figure F.1: Total relevance of PRCA and three ablations when varying the subspace dimensionality (the variable $d$); higher is better. The analysis is performed on VGG16-TV with LRP (same as column 1 in Table \ref{['table:maxcontribution']}). Each curve is an average over the means of these 50 classes, and shaded regions represent one standard error (over classes). The horizontal solid line represents the total relevance of no subspace projection.
Figure F.2: Area Under the Patch-flipping Curve (AUPC), separability and peakness scores of subspaces $\boldsymbol{U}$ extracted by DSA and DRSA at (a) different layers with 4 subspaces or (b) Conv4_3 with different numbers of subspaces.
...and 11 more figures

Theorems & Definitions (7)

Proposition 1
proof
Proposition 2
proof
Remark
Proposition
proof

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

TL;DR

Abstract

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (7)