Uncertainty-Aware Concept Bottleneck Models with Enhanced Interpretability
Haifei Zhang, Patrick Barry, Eduardo Brandao
TL;DR
The paper tackles the interpretability-uncertainty trade-off in Concept Bottleneck Models (CBMs) by introducing a Class-Level Prototype Classifier (CLPC) that learns one binary prototype per class in the concept space and classifies via proximity to predicted concepts. Uncertainty is quantified through a conformal prediction framework that yields set-valued predictions and the option to abstain, while interpretability is enhanced by rule-like prototypes and per-concept contribution analyses. The approach includes a two-stage training procedure (concept predictor followed by prototype learning with sparsity and binarization regularizers) and a targeted concept-intervention mechanism that provides counterfactual explanations for misclassifications. Empirical results on CUB-200-2011, Derm7pt, and RIVAL10 show competitive label accuracy with improved uncertainty calibration, robustness to concept-noise, and effective, gain-driven concept interventions. This work advances CBMs by delivering reliable, interpretable, and uncertainty-aware predictions suitable for high-stakes applications.
Abstract
In the context of image classification, Concept Bottleneck Models (CBMs) first embed images into a set of human-understandable concepts, followed by an intrinsically interpretable classifier that predicts labels based on these intermediate representations. While CBMs offer a semantically meaningful and interpretable classification pipeline, they often sacrifice predictive performance compared to end-to-end convolutional neural networks. Moreover, the propagation of uncertainty from concept predictions to final label decisions remains underexplored. In this paper, we propose a novel uncertainty-aware and interpretable classifier for the second stage of CBMs. Our method learns a set of binary class-level concept prototypes and uses the distances between predicted concept vectors and each class prototype as both a classification score and a measure of uncertainty. These prototypes also serve as interpretable classification rules, indicating which concepts should be present in an image to justify a specific class prediction. The proposed framework enhances both interpretability and robustness by enabling conformal prediction for uncertain or outlier inputs based on their deviation from the learned binary class-level concept prototypes.
