Table of Contents
Fetching ...

Post-hoc Part-prototype Networks

Andong Tan, Fengtao Zhou, Hao Chen

TL;DR

This work addresses the need for global interpretability in deep vision models by proposing a post-hoc part-prototype network that decomposes a trained classification head into $k$ interpretable prototypes, satisfying $\mathbf{v} = \sum_{i=1}^k \tilde{\mathbf{p}}_i$ and producing heatmaps $\mathbf{x} \tilde{\mathbf{p}}_i^T$. Prototypes are discovered via unsupervised NMF and refined through a residual-distribution process to exactly reconstruct the head while maintaining interpretability, with optimization steps including $\min_{\mathbf{E},\mathbf{P}} \|\mathbf{F} - \mathbf{E P}\|_2^2$ and $\min_{\alpha_i} \|\mathbf{v} - \sum_{i=1}^k \alpha_i \mathbf{p}_i\|_2^2$, followed by Nelder–Mead refinement. The approach guarantees performance and yields more faithful explanations than prior methods, demonstrated by explainability axioms and quantitative metrics across multiple backbones and large-scale datasets like ImageNet. It enables scalable, post-hoc, global interpretability for complex models by linking specific prototypes to semantically meaningful object parts. The work highlights a practical path toward transparent AI systems without retraining, while acknowledging limitations tied to the pretrained head’s feature space.

Abstract

Post-hoc explainability methods such as Grad-CAM are popular because they do not influence the performance of a trained model. However, they mainly reveal "where" a model looks at for a given input, fail to explain "what" the model looks for (e.g., what is important to classify a bird image to a Scott Oriole?). Existing part-prototype networks leverage part-prototypes (e.g., characteristic Scott Oriole's wing and head) to answer both "where" and "what", but often under-perform their black box counterparts in the accuracy. Therefore, a natural question is: can one construct a network that answers both "where" and "what" in a post-hoc manner to guarantee the model's performance? To this end, we propose the first post-hoc part-prototype network via decomposing the classification head of a trained model into a set of interpretable part-prototypes. Concretely, we propose an unsupervised prototype discovery and refining strategy to obtain prototypes that can precisely reconstruct the classification head, yet being interpretable. Besides guaranteeing the performance, we show that our network offers more faithful explanations qualitatively and yields even better part-prototypes quantitatively than prior part-prototype networks.

Post-hoc Part-prototype Networks

TL;DR

This work addresses the need for global interpretability in deep vision models by proposing a post-hoc part-prototype network that decomposes a trained classification head into interpretable prototypes, satisfying and producing heatmaps . Prototypes are discovered via unsupervised NMF and refined through a residual-distribution process to exactly reconstruct the head while maintaining interpretability, with optimization steps including and , followed by Nelder–Mead refinement. The approach guarantees performance and yields more faithful explanations than prior methods, demonstrated by explainability axioms and quantitative metrics across multiple backbones and large-scale datasets like ImageNet. It enables scalable, post-hoc, global interpretability for complex models by linking specific prototypes to semantically meaningful object parts. The work highlights a practical path toward transparent AI systems without retraining, while acknowledging limitations tied to the pretrained head’s feature space.

Abstract

Post-hoc explainability methods such as Grad-CAM are popular because they do not influence the performance of a trained model. However, they mainly reveal "where" a model looks at for a given input, fail to explain "what" the model looks for (e.g., what is important to classify a bird image to a Scott Oriole?). Existing part-prototype networks leverage part-prototypes (e.g., characteristic Scott Oriole's wing and head) to answer both "where" and "what", but often under-perform their black box counterparts in the accuracy. Therefore, a natural question is: can one construct a network that answers both "where" and "what" in a post-hoc manner to guarantee the model's performance? To this end, we propose the first post-hoc part-prototype network via decomposing the classification head of a trained model into a set of interpretable part-prototypes. Concretely, we propose an unsupervised prototype discovery and refining strategy to obtain prototypes that can precisely reconstruct the classification head, yet being interpretable. Besides guaranteeing the performance, we show that our network offers more faithful explanations qualitatively and yields even better part-prototypes quantitatively than prior part-prototype networks.
Paper Structure (26 sections, 9 equations, 8 figures, 6 tables)

This paper contains 26 sections, 9 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: Prior post-hoc methods such as Grad-CAM selvaraju2017grad fall short in answering "what the model looks for" (a type of global interpretability) for transparent decision making. On the other hand, prior part-prototype based models often sacrifice the performance. Our post-hoc part-prototype network is the first model offering part-prototype based global interpretability for decision making, while guaranteeing the prediction performance.
  • Figure 2: Overview of constructing a Post-hoc Part-Prototype Network. Given a trained black box model, we aim to fully decompose the classification head into interpretable prototypes. To achieve this, we first discover a set of prototypes $\mathbf{P}$ via NMF (step 1) for each class. In each image, we display the heatmaps of 3 prototypes in different colors for easier comparison. Visualizations are created via up-sampling the feature map to the original image resolution for ease of reading. Then we refine these prototypes $\mathbf{P}$ via scaling and dynamic residual parameter distribution subject to interpretability constraints. This step aims to guarantee a precise reconstruction only using part-prototypes without sacrificing the interpretability of them.
  • Figure 3: After discovering initial prototypes via None-negative Matrix Factorization, our refinement step makes the part-prototypes more discriminative for complete performance recovery, while maintaining their interpretability.
  • Figure 4: Comparison of 3 architectures and their prototypes in ImageNet deng2009imagenet. We visualize the presence areas of 3 prototypes in 3 example images from the class "sled dog".
  • Figure 5: Interpretability benefits of our proposed prototype refinement strategy measured by prototype's consistency and stability scores. Our dynamic refinement strategy (blue) not only outperforms the naive distribution (orange) by a huge margin, but also surprisingly improves the initial prototypes (purple) in most cases.
  • ...and 3 more figures