Respect the model: Fine-grained and Robust Explanation with Sharing Ratio Decomposition

Sangyu Han; Yearim Kim; Nojun Kwak

Respect the model: Fine-grained and Robust Explanation with Sharing Ratio Decomposition

Sangyu Han, Yearim Kim, Nojun Kwak

TL;DR

A novel eXplainable AI (XAI) method called SRD (Sharing Ratio Decomposition), which sincerely reflects the model's inference process, resulting in significantly enhanced robustness in the authors' explanations.

Abstract

The truthfulness of existing explanation methods in authentically elucidating the underlying model's decision-making process has been questioned. Existing methods have deviated from faithfully representing the model, thus susceptible to adversarial attacks. To address this, we propose a novel eXplainable AI (XAI) method called SRD (Sharing Ratio Decomposition), which sincerely reflects the model's inference process, resulting in significantly enhanced robustness in our explanations. Different from the conventional emphasis on the neuronal level, we adopt a vector perspective to consider the intricate nonlinear interactions between filters. We also introduce an interesting observation termed Activation-Pattern-Only Prediction (APOP), letting us emphasize the importance of inactive neurons and redefine relevance encapsulating all relevant information including both active and inactive neurons. Our method, SRD, allows for the recursive decomposition of a Pointwise Feature Vector (PFV), providing a high-resolution Effective Receptive Field (ERF) at any layer.

Respect the model: Fine-grained and Robust Explanation with Sharing Ratio Decomposition

TL;DR

Abstract

Paper Structure (20 sections, 28 equations, 23 figures, 4 tables, 1 algorithm)

This paper contains 20 sections, 28 equations, 23 figures, 4 tables, 1 algorithm.

Introduction
Related Works
Method: Sharing Ratio Decomposition (SRD)
Forward Pass
Backward Pass (for calculating sharing ratio)
Experiment
Qualitative Results
Quantitative Results
Adversarial Robustness
Conclusion
Future works: Global Explanation with SRD
Detail description of affine function Lg
Proof of equivalence between forward and backward processes
More Result of APOP
Detail of metrics
...and 5 more sections

Figures (23)

Figure 1: Forward Pass of our method. Top: An illustration of inference process. Red box portrays the contribution of $v^{25}_i$s in forming $v^{27}_{(5,7)}$, quantified by $\mu^{25 \rightarrow 27}_{i \rightarrow (5,7)}$. Each $v^{25}_i$ is labeled with its corresponding ERF. Bottom Left: The process of building ERF for $v^{27}_{(5,7)}$. Bottom Right: The final saliency map is derived as a weighted sum of the ERFs at the encoder output layer.
Figure 2: Backward Pass of our method. $i$ and $j$ are pixels in activation layer $l$ and $k$, respectively. Left: $v^{k}_{j}$ is a pre-activation PFV at activation layer $k$, $v^l_i$ is a post-activation PFV at activation layer $l$, $f^{l}_{i \rightarrow j}$ is an affine transformation function assigned to $(i, j)$. Summation of every $\hat{v}^l_{i \rightarrow j}$ leads to $v_j^{k}$ ($\sum_{i \in RF_j^{k} } \hat{v}^l_{i \rightarrow j} = v_j^{k}$). $\mu^{l \rightarrow k}_{i\rightarrow j}$ is a sharing ratio of each $v^l_{i\rightarrow j}$ to $v_j^{k}$. $R_{i\rightarrow{j}}^{l}$ is the relevance share of $i$ in the leading layer to $j$ in the following layer. Right: $RF^{k}_j$ is the receptive field of pixel $j$ and $R^{k}_j$ is the relevance score of $j$ to the output. Relevance $R^l_i$ in the leading layer can be calculated recursively using the next layer's relevance $R^{k}_j$'s via $R^l_{i\rightarrow j}$'s for $j$'s whose receptive field includes pixel $i$.
Figure 3: Qualitative results on ResNet50 for the class label 'Dog (Top)' and 'Cat (Bottom)'. Methods decorated with † have the resolution of (7, 7) and methods with ‡ have the resolution of (28, 28), while the others have the input-scale resolution, (224, 224). Notably, compared to other methods, SRD for input resolution is adept at capturing the fine details of the image. Best viewed when enlarged.
Figure 4: Adversarial attack experiment. Top: Qualitative comparison between explanations. While other methods deleted the goldfish (original image) in their explanation due to the manipulation, our method successfully retained the goldfish part. For more results, see Appendix \ref{['sec:manipulation']}. Bottom: Quantitative result. Higher SSIM and PCC scores indicate less susceptibility to perturbation manipulation. In both SSIM and PCC, our method demonstrates superior defense against adversarial attack.
Figure 5: Top: Nearest neighbor PFVs encode similar concepts to that of the target PFV. By labeling each PFV with its ERF, we emphirically observed that the local manifold near certain PFV encodes a concept. For example, to know what $v^{29}_{(4,9)}$ encodes, we find its top 3 nearest neighbors from other samples' PFVs. Bottom: Recursive global explanation to explain the decision-making process of the model. Given modified sharing ratio, $\mu_{i \rightarrow c}^{L \rightarrow O}$, we know how much a certain concept of PFV $v^L_i$ at layer $L$ contributed to output. For example, $v^{29}_{(4,9)}$ is a PFV of (4, 9) in layer 29 which represents "fluffy bird head" concept contributed $\mu_{(4,9) \rightarrow c}^{L \rightarrow O}$ of the total prediction. $v^{29}_{(4,9)}$ is formed by the subconcepts of $[v^{27}_{(4,9)}: \textit{"bird head"}, v^{27}_{(4,10)}: \textit{"beak"}, ..., v^{27}_{(5,10)}: \textit{"small animal head"}]$, whose contributions are $\mu^{27 \rightarrow 29}_{i \rightarrow (4,9)}$. These 'subconcepts' can be further decomposed into minor concepts recursively, revealing the full decision-making process of the deep neural network.
...and 18 more figures

Respect the model: Fine-grained and Robust Explanation with Sharing Ratio Decomposition

TL;DR

Abstract

Respect the model: Fine-grained and Robust Explanation with Sharing Ratio Decomposition

Authors

TL;DR

Abstract

Table of Contents

Figures (23)