Table of Contents
Fetching ...

Leveraging Local Structure for Improving Model Explanations: An Information Propagation Approach

Ruo Yang, Binghui Wang, Mustafa Bilgic

TL;DR

This work proposes a method called IProp, which models each pixel's individual attribution score as a source of explanatory information and explains the image prediction through the dynamic propagation of information across all pixels.

Abstract

Numerous explanation methods have been recently developed to interpret the decisions made by deep neural network (DNN) models. For image classifiers, these methods typically provide an attribution score to each pixel in the image to quantify its contribution to the prediction. However, most of these explanation methods appropriate attribution scores to pixels independently, even though both humans and DNNs make decisions by analyzing a set of closely related pixels simultaneously. Hence, the attribution score of a pixel should be evaluated jointly by considering itself and its structurally-similar pixels. We propose a method called IProp, which models each pixel's individual attribution score as a source of explanatory information and explains the image prediction through the dynamic propagation of information across all pixels. To formulate the information propagation, IProp adopts the Markov Reward Process, which guarantees convergence, and the final status indicates the desired pixels' attribution scores. Furthermore, IProp is compatible with any existing attribution-based explanation method. Extensive experiments on various explanation methods and DNN models verify that IProp significantly improves them on a variety of interpretability metrics.

Leveraging Local Structure for Improving Model Explanations: An Information Propagation Approach

TL;DR

This work proposes a method called IProp, which models each pixel's individual attribution score as a source of explanatory information and explains the image prediction through the dynamic propagation of information across all pixels.

Abstract

Numerous explanation methods have been recently developed to interpret the decisions made by deep neural network (DNN) models. For image classifiers, these methods typically provide an attribution score to each pixel in the image to quantify its contribution to the prediction. However, most of these explanation methods appropriate attribution scores to pixels independently, even though both humans and DNNs make decisions by analyzing a set of closely related pixels simultaneously. Hence, the attribution score of a pixel should be evaluated jointly by considering itself and its structurally-similar pixels. We propose a method called IProp, which models each pixel's individual attribution score as a source of explanatory information and explains the image prediction through the dynamic propagation of information across all pixels. To formulate the information propagation, IProp adopts the Markov Reward Process, which guarantees convergence, and the final status indicates the desired pixels' attribution scores. Furthermore, IProp is compatible with any existing attribution-based explanation method. Extensive experiments on various explanation methods and DNN models verify that IProp significantly improves them on a variety of interpretability metrics.
Paper Structure (12 sections, 1 theorem, 3 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 12 sections, 1 theorem, 3 equations, 7 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

The value iteration in IProp (Eq. value_ite) is guaranteed to converge to the unique solution $AM_{IProp}^{*}$ for any initial $AM_{IProp}^{0}$, i.e., $\lim_{k \rightarrow \infty}AM_{IProp}^{k} = AM_{IProp}^{*}$. s.t. $AM_{IProp}^{*} = (I_{N} - \gamma P)^{-1} \cdot AM$.

Figures (7)

  • Figure 1: Attribution maps of the existing explanation methods (top row) and those (bottom row) with our information propagation on the InceptionV3 model. Information propagation ensures maps assign scores more evenly across the object in the image.
  • Figure 2: Illustration of IProp. IProp first builds a weighted graph based on image pixels, where each pixel is a node and the weight of an edge is obtained using the pixels' spatial and color information. The weighted graph is associated with a transition matrix. Then, IProp performs information propagation based on Markov Reward Process, which takes the transition matrix and pixels' initial rewards as input. Note the pixels' attribution scores (formed as an attribution map), which can be generated by any baseline explanation method, can be treated as the pixels' initial rewards. When the propagation converges, IProp produces pixels' final attribution scores, forming the IProp's attribution map.
  • Figure 3: Example of neighboring (blue) nodes for a given (black) node when $K=2$.
  • Figure 4: Attribution map of baseline methods and that with IProp. IProp ensures attribution maps focus more on the object, while baseline methods assign attribute scores to many pixels not in the object.
  • Figure 5: Average value iteration converge time for all 5K test images evaluated on the InceptionV3 model.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Theorem 1