Table of Contents
Fetching ...

Prototypical Self-Explainable Models Without Re-training

Srishti Gautam, Ahcene Boubekki, Marina M. C. Höhne, Michael C. Kampffmeyer

TL;DR

This work tackles the scarcity of readily usable self-explainable models by introducing KMEx, a universal method that converts any pre-trained model into a prototypical self-explainable model without re-training the backbone. KMEx reuses the existing encoder, learns $L$ class-specific prototypes via $k$-means in the embedding space, and replaces the classifier with a transparent $1$-nearest-neighbor decision rule, producing global prototype explanations and local PRP-based maps. The authors propose a novel, objective evaluation framework centered on three predicates—transparency, diversity, and trustworthiness—to assess SEMs beyond predictive accuracy, and demonstrate that KMEx achieves strong transparency and faithfulness with competitive accuracy across seven datasets while enabling diversity improvements without backbone modification. The paper also shows that KMEx can improve prototype diversity and minority-subclass representation when applied to existing SEM embeddings, offering a practical, scalable path toward broader adoption of self-explainable models in safety-critical settings.

Abstract

Explainable AI (XAI) has unfolded in two distinct research directions with, on the one hand, post-hoc methods that explain the predictions of a pre-trained black-box model and, on the other hand, self-explainable models (SEMs) which are trained directly to provide explanations alongside their predictions. While the latter is preferred in safety-critical scenarios, post-hoc approaches have received the majority of attention until now, owing to their simplicity and ability to explain base models without retraining. Current SEMs, instead, require complex architectures and heavily regularized loss functions, thus necessitating specific and costly training. To address this shortcoming and facilitate wider use of SEMs, we propose a simple yet efficient universal method called KMEx (K-Means Explainer), which can convert any existing pre-trained model into a prototypical SEM. The motivation behind KMEx is to enhance transparency in deep learning-based decision-making via class-prototype-based explanations that are diverse and trustworthy without retraining the base model. We compare models obtained from KMEx to state-of-the-art SEMs using an extensive qualitative evaluation to highlight the strengths and weaknesses of each model, further paving the way toward a more reliable and objective evaluation of SEMs (The code is available at https://github.com/SrishtiGautam/KMEx).

Prototypical Self-Explainable Models Without Re-training

TL;DR

This work tackles the scarcity of readily usable self-explainable models by introducing KMEx, a universal method that converts any pre-trained model into a prototypical self-explainable model without re-training the backbone. KMEx reuses the existing encoder, learns class-specific prototypes via -means in the embedding space, and replaces the classifier with a transparent -nearest-neighbor decision rule, producing global prototype explanations and local PRP-based maps. The authors propose a novel, objective evaluation framework centered on three predicates—transparency, diversity, and trustworthiness—to assess SEMs beyond predictive accuracy, and demonstrate that KMEx achieves strong transparency and faithfulness with competitive accuracy across seven datasets while enabling diversity improvements without backbone modification. The paper also shows that KMEx can improve prototype diversity and minority-subclass representation when applied to existing SEM embeddings, offering a practical, scalable path toward broader adoption of self-explainable models in safety-critical settings.

Abstract

Explainable AI (XAI) has unfolded in two distinct research directions with, on the one hand, post-hoc methods that explain the predictions of a pre-trained black-box model and, on the other hand, self-explainable models (SEMs) which are trained directly to provide explanations alongside their predictions. While the latter is preferred in safety-critical scenarios, post-hoc approaches have received the majority of attention until now, owing to their simplicity and ability to explain base models without retraining. Current SEMs, instead, require complex architectures and heavily regularized loss functions, thus necessitating specific and costly training. To address this shortcoming and facilitate wider use of SEMs, we propose a simple yet efficient universal method called KMEx (K-Means Explainer), which can convert any existing pre-trained model into a prototypical SEM. The motivation behind KMEx is to enhance transparency in deep learning-based decision-making via class-prototype-based explanations that are diverse and trustworthy without retraining the base model. We compare models obtained from KMEx to state-of-the-art SEMs using an extensive qualitative evaluation to highlight the strengths and weaknesses of each model, further paving the way toward a more reliable and objective evaluation of SEMs (The code is available at https://github.com/SrishtiGautam/KMEx).
Paper Structure (40 sections, 4 equations, 15 figures, 14 tables)

This paper contains 40 sections, 4 equations, 15 figures, 14 tables.

Figures (15)

  • Figure 1: Schematic representation of KMEx. Left: The black-box classifier is removed and replaced by a nearest neighbor classifier based on prototypes learned using $k$-means in the embedding space. The UMAP umap representation is the projection of the learned embedding space for STL-10, along with prototypes, depicted as squares. Right: The prototypes are visualized in the input space using the closest training images.
  • Figure 2: Qualitative evaluation of KMEx: Prototypes learned by KMEx for MNIST for class '7' (left) and STL-10 for class 'bird' (right) are shown at the top, demonstrating global explainability. This looks like that behavior for test images are shown at the bottom, along with PRP maps demonstrating the regions activated by closest prototypes (in red) for the test images, exhibiting local explainability.
  • Figure 3: Relevance Ordering curves computed on different datasets and with different architectures, along with the respective random baselines (dashed).
  • Figure 4: Summary of each model's strengths and weaknesses.
  • Figure 5: Analysis of the attributes captured by SEMs for different numbers of prototypes for CelebA.
  • ...and 10 more figures