Table of Contents
Fetching ...

Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

Maximilian Dreyer, Reduan Achtibat, Wojciech Samek, Sebastian Lapuschkin

TL;DR

This work introduces a post-hoc concept-based framework that unifies local (instance-level) and global (class-level) explanations via prototypical decision strategies. By modeling class-specific distributions of concept relevances as mixtures of Gaussians, it derives prototypical prediction strategies and uses log-likelihoods to assign new predictions to these prototypes, enabling objective assessment of how ordinary or extraordinary a prediction is relative to global behavior. The framework supports both explanation and validation: it provides concept relevance scores, localizations, and visualizations, while enabling outlier and data-quality detection, spurious-behavior spotting, and OOD detection, all validated across ImageNet, CUB-200, and CIFAR-10 with VGG, ResNet, and EfficientNet. Key findings show that relevance-based concept attributions are more faithful and disentangled than activations, and that Gaussian mixture prototypes improve coverage and outlier detection, with practical implications for scalable model validation in safety-critical settings. Overall, prototypical concept-based explanations offer objective, automated insights into model behavior, reducing reliance on human interpretation and aiding in robust deployment.

Abstract

Ensuring both transparency and safety is critical when deploying Deep Neural Networks (DNNs) in high-risk applications, such as medicine. The field of explainable AI (XAI) has proposed various methods to comprehend the decision-making processes of opaque DNNs. However, only few XAI methods are suitable of ensuring safety in practice as they heavily rely on repeated labor-intensive and possibly biased human assessment. In this work, we present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes. What sets our approach apart is the combination of local and global strategies, enabling a clearer understanding of the (dis-)similarities in model decisions compared to the expected (prototypical) concept use, ultimately reducing the dependence on human long-term assessment. Quantifying the deviation from prototypical behavior not only allows to associate predictions with specific model sub-strategies but also to detect outlier behavior. As such, our approach constitutes an intuitive and explainable tool for model validation. We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets (ImageNet, CUB-200, and CIFAR-10) utilizing VGG, ResNet, and EfficientNet architectures. Code is available on https://github.com/maxdreyer/pcx.

Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

TL;DR

This work introduces a post-hoc concept-based framework that unifies local (instance-level) and global (class-level) explanations via prototypical decision strategies. By modeling class-specific distributions of concept relevances as mixtures of Gaussians, it derives prototypical prediction strategies and uses log-likelihoods to assign new predictions to these prototypes, enabling objective assessment of how ordinary or extraordinary a prediction is relative to global behavior. The framework supports both explanation and validation: it provides concept relevance scores, localizations, and visualizations, while enabling outlier and data-quality detection, spurious-behavior spotting, and OOD detection, all validated across ImageNet, CUB-200, and CIFAR-10 with VGG, ResNet, and EfficientNet. Key findings show that relevance-based concept attributions are more faithful and disentangled than activations, and that Gaussian mixture prototypes improve coverage and outlier detection, with practical implications for scalable model validation in safety-critical settings. Overall, prototypical concept-based explanations offer objective, automated insights into model behavior, reducing reliance on human interpretation and aiding in robust deployment.

Abstract

Ensuring both transparency and safety is critical when deploying Deep Neural Networks (DNNs) in high-risk applications, such as medicine. The field of explainable AI (XAI) has proposed various methods to comprehend the decision-making processes of opaque DNNs. However, only few XAI methods are suitable of ensuring safety in practice as they heavily rely on repeated labor-intensive and possibly biased human assessment. In this work, we present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes. What sets our approach apart is the combination of local and global strategies, enabling a clearer understanding of the (dis-)similarities in model decisions compared to the expected (prototypical) concept use, ultimately reducing the dependence on human long-term assessment. Quantifying the deviation from prototypical behavior not only allows to associate predictions with specific model sub-strategies but also to detect outlier behavior. As such, our approach constitutes an intuitive and explainable tool for model validation. We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets (ImageNet, CUB-200, and CIFAR-10) utilizing VGG, ResNet, and EfficientNet architectures. Code is available on https://github.com/maxdreyer/pcx.
Paper Structure (67 sections, 26 equations, 35 figures, 6 tables)

This paper contains 67 sections, 26 equations, 35 figures, 6 tables.

Figures (35)

  • Figure 1: Using the framework: By contrasting a prediction with the prototypical prediction strategy, the stakeholder can understand how (un-)ordinarly the model behaves. (a): A flamingo prediction is based on concepts like "feather", "red color" and "water". While recent concept-based methods provide relevance scores, localization heatmaps, and visualizations for each concept, it remains unclear whether such composition of used concepts is expected. (b): Comparing against prototypes enables to understand to what extend concepts are similar (e.g., "feather"), underused (e.g., "red color"), or overused (e.g., "water"). These differences can be quantitatively measured to assess the degree of an outlier prediction. (c): allows to automatically identify outliers, or, alternatively, the closest prototypical prediction strategy. Prototypes are hereby automatically discovered, summarizing the global model behavior in condensed fashion.
  • Figure 2: Intuition behind modeling prototypes: (top): In concept space, each dimension represents the relevance or activation of a concept. We assume, that concept vectors $\boldsymbol{\nu}$ of a specific class are forming distinct clusters that can be approximated by a mixture of Gaussian distributions (). (bottom): Concept relevances ( $\varepsilon$-rule) result in more disentangled UMAP embeddings compared to activations. Shown are eight feline ImageNet classes (differently color-coded) for the VGG-16's last convolutional layer.
  • Figure 3: Pre-processing pipeline of : predictions are generated over training samples of a specific class. We further compute concept relevance scores for each prediction, representing prediction strategies. By fitting on the concept relevance vectors, we find prototypical prediction strategies.
  • Figure 4: Prototypes allow for a global understanding of class prediction (dis-)similarities. (a) Similarity matrix of the first 20 ImageNet class prototypes. We can identify distinct clusters for fishes and bird species. (b) Unraveling the (dis-)similarities of the Brambling and Robin prototype: Whereas both are similar in terms of orange-brown color in parts, they differ , e.g., in a "gray-white spotted" texture (indication for Brambling).
  • Figure 5: Revealing spurious model behavior with : (a) Firstly, we examine the characteristic concepts of each prototype to find spurious concepts. As shown, a spurious Chinese watermark concept is most relevant for the prototype of the "carton" class. (b) Secondly, clusters of training predictions that deviate strongly from prototypes can be studied for spurious behavior. For the "carton" class, we reveal a cluster of Tiger Cats in cartons, that lead to the model using cat features to predict the carton class.
  • ...and 30 more figures