Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations
Maximilian Dreyer, Reduan Achtibat, Wojciech Samek, Sebastian Lapuschkin
TL;DR
This work introduces a post-hoc concept-based framework that unifies local (instance-level) and global (class-level) explanations via prototypical decision strategies. By modeling class-specific distributions of concept relevances as mixtures of Gaussians, it derives prototypical prediction strategies and uses log-likelihoods to assign new predictions to these prototypes, enabling objective assessment of how ordinary or extraordinary a prediction is relative to global behavior. The framework supports both explanation and validation: it provides concept relevance scores, localizations, and visualizations, while enabling outlier and data-quality detection, spurious-behavior spotting, and OOD detection, all validated across ImageNet, CUB-200, and CIFAR-10 with VGG, ResNet, and EfficientNet. Key findings show that relevance-based concept attributions are more faithful and disentangled than activations, and that Gaussian mixture prototypes improve coverage and outlier detection, with practical implications for scalable model validation in safety-critical settings. Overall, prototypical concept-based explanations offer objective, automated insights into model behavior, reducing reliance on human interpretation and aiding in robust deployment.
Abstract
Ensuring both transparency and safety is critical when deploying Deep Neural Networks (DNNs) in high-risk applications, such as medicine. The field of explainable AI (XAI) has proposed various methods to comprehend the decision-making processes of opaque DNNs. However, only few XAI methods are suitable of ensuring safety in practice as they heavily rely on repeated labor-intensive and possibly biased human assessment. In this work, we present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes. What sets our approach apart is the combination of local and global strategies, enabling a clearer understanding of the (dis-)similarities in model decisions compared to the expected (prototypical) concept use, ultimately reducing the dependence on human long-term assessment. Quantifying the deviation from prototypical behavior not only allows to associate predictions with specific model sub-strategies but also to detect outlier behavior. As such, our approach constitutes an intuitive and explainable tool for model validation. We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets (ImageNet, CUB-200, and CIFAR-10) utilizing VGG, ResNet, and EfficientNet architectures. Code is available on https://github.com/maxdreyer/pcx.
