Table of Contents
Fetching ...

Why Does It Look There? Structured Explanations for Image Classification

Jiarui Li, Zixiang Yin, Samuel J Landry, Zhengming Ding, Ramgopal R. Mettu

TL;DR

Interpretability to Explainability (I2X), a framework that builds structured explanations directly from unstructured interpretability by quantifying progress at selected checkpoints during training using prototypes extracted from post-hoc XAI methods, is proposed.

Abstract

Deep learning models achieve remarkable predictive performance, yet their black-box nature limits transparency and trustworthiness. Although numerous explainable artificial intelligence (XAI) methods have been proposed, they primarily provide saliency maps or concepts (i.e., unstructured interpretability). Existing approaches often rely on auxiliary models (\eg, GPT, CLIP) to describe model behavior, thereby compromising faithfulness to the original models. We propose Interpretability to Explainability (I2X), a framework that builds structured explanations directly from unstructured interpretability by quantifying progress at selected checkpoints during training using prototypes extracted from post-hoc XAI methods (e.g., GradCAM). I2X answers the question of "why does it look there" by providing a structured view of both intra- and inter-class decision making during training. Experiments on MNIST and CIFAR10 demonstrate effectiveness of I2X to reveal prototype-based inference process of various image classification models. Moreover, we demonstrate that I2X can be used to improve predictions across different model architectures and datasets: we can identify uncertain prototypes recognized by I2X and then use targeted perturbation of samples that allows fine-tuning to ultimately improve accuracy. Thus, I2X not only faithfully explains model behavior but also provides a practical approach to guide optimization toward desired targets.

Why Does It Look There? Structured Explanations for Image Classification

TL;DR

Interpretability to Explainability (I2X), a framework that builds structured explanations directly from unstructured interpretability by quantifying progress at selected checkpoints during training using prototypes extracted from post-hoc XAI methods, is proposed.

Abstract

Deep learning models achieve remarkable predictive performance, yet their black-box nature limits transparency and trustworthiness. Although numerous explainable artificial intelligence (XAI) methods have been proposed, they primarily provide saliency maps or concepts (i.e., unstructured interpretability). Existing approaches often rely on auxiliary models (\eg, GPT, CLIP) to describe model behavior, thereby compromising faithfulness to the original models. We propose Interpretability to Explainability (I2X), a framework that builds structured explanations directly from unstructured interpretability by quantifying progress at selected checkpoints during training using prototypes extracted from post-hoc XAI methods (e.g., GradCAM). I2X answers the question of "why does it look there" by providing a structured view of both intra- and inter-class decision making during training. Experiments on MNIST and CIFAR10 demonstrate effectiveness of I2X to reveal prototype-based inference process of various image classification models. Moreover, we demonstrate that I2X can be used to improve predictions across different model architectures and datasets: we can identify uncertain prototypes recognized by I2X and then use targeted perturbation of samples that allows fine-tuning to ultimately improve accuracy. Thus, I2X not only faithfully explains model behavior but also provides a practical approach to guide optimization toward desired targets.
Paper Structure (16 sections, 8 equations, 9 figures, 2 tables)

This paper contains 16 sections, 8 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: The difference between unstructured interpretability (, with saliency maps) and our approach to structured explainability. I2X tracks model evolution across training checkpoints using a prototype-based representation.
  • Figure 2: Interpretability to Explainability (I2X). I2X is a framework that builds structured explanation from evolutions of prototype intensity from saliency maps obtained by post-hoc method, and model confidences. I2X can systematically explain how a model infers and learns a predictive label and further utilize the resulting structured explanation to guide optimization toward the desired targets then improve performance.
  • Figure 3: Visualization of Prototype and Confidence Evolution of ResNet-50 on MNIST (shared prototypes of digit 7 as example). The first column shows the abstract prototypes of digit 7. Subsequent columns depict how sharing each prototype affects confidence for digit 7 (increase or decrease) and which competing class drives the change. White blanks denote no confidence change at that checkpoint, while black blanks indicate a change because 7 contains the prototype whereas another digit does not.
  • Figure 4: Annotated Training Checkpoints for ResNet-50 on MNIST (shared prototypes of digit 7 as example). This figure shows how the raw model evolves across training checkpoints. An edge $a \to b$ denotes that the model decreases confidence of class $a$ and increase confidence of class $b$. For example, $6,2 \to T0$ indicates that the model increases the prediction confidence of digit 7 (training checkpoint $T0$) while decreasing the confidence of digits 6 and 2. The labels on the arrows specify the prototypes responsible for each confidence change.
  • Figure 5: Annotated training checkpoints for ResNet-50 on MNIST illustrating all prototypes involved in distinguishing digit 7 from digit 2. Prototypes P-20 and P-19 are consistently assigned as discriminative evidence for digit 7. In contrast, prototype P-26 confuses model with its contribution alternating between digits 7 and 2 across training checkpoints, indicating uncertain evidence during training.
  • ...and 4 more figures