Table of Contents
Fetching ...

Towards Modeling Uncertainties of Self-explaining Neural Networks via Conformal Prediction

Wei Qian, Chenxu Zhao, Yangyi Li, Fenglong Ma, Chao Zhang, Mengdi Huai

TL;DR

This work tackles the lack of distribution-free uncertainty quantification in self-explaining neural networks by introducing unSENN, a conformal-prediction-based framework that jointly quantifies uncertainty in the interpretation (concept) space and the final prediction space. It defines principled non-conformity measures for the interpretation layer, constructs concept prediction sets with guaranteed coverage $1-\varepsilon$ via the quantile $Q_{1-\varepsilon}$, and links these to final label prediction sets through transfer functions and an adversarial optimization formulation. The approach supports both ground-truth concepts (concept bottlenecks) and prototype-based explanations, with theoretical guarantees under exchangeability and extensive empirical validation on MNIST and CIFAR-100 Super-class showing tighter, reliable uncertainty sets compared to baselines. Overall, unSENN enables reliable, distribution-free uncertainty quantification for both explanations and predictions, enhancing trust and applicability of self-explaining models in risk-sensitive domains.

Abstract

Despite the recent progress in deep neural networks (DNNs), it remains challenging to explain the predictions made by DNNs. Existing explanation methods for DNNs mainly focus on post-hoc explanations where another explanatory model is employed to provide explanations. The fact that post-hoc methods can fail to reveal the actual original reasoning process of DNNs raises the need to build DNNs with built-in interpretability. Motivated by this, many self-explaining neural networks have been proposed to generate not only accurate predictions but also clear and intuitive insights into why a particular decision was made. However, existing self-explaining networks are limited in providing distribution-free uncertainty quantification for the two simultaneously generated prediction outcomes (i.e., a sample's final prediction and its corresponding explanations for interpreting that prediction). Importantly, they also fail to establish a connection between the confidence values assigned to the generated explanations in the interpretation layer and those allocated to the final predictions in the ultimate prediction layer. To tackle the aforementioned challenges, in this paper, we design a novel uncertainty modeling framework for self-explaining networks, which not only demonstrates strong distribution-free uncertainty modeling performance for the generated explanations in the interpretation layer but also excels in producing efficient and effective prediction sets for the final predictions based on the informative high-level basis explanations. We perform the theoretical analysis for the proposed framework. Extensive experimental evaluation demonstrates the effectiveness of the proposed uncertainty framework.

Towards Modeling Uncertainties of Self-explaining Neural Networks via Conformal Prediction

TL;DR

This work tackles the lack of distribution-free uncertainty quantification in self-explaining neural networks by introducing unSENN, a conformal-prediction-based framework that jointly quantifies uncertainty in the interpretation (concept) space and the final prediction space. It defines principled non-conformity measures for the interpretation layer, constructs concept prediction sets with guaranteed coverage via the quantile , and links these to final label prediction sets through transfer functions and an adversarial optimization formulation. The approach supports both ground-truth concepts (concept bottlenecks) and prototype-based explanations, with theoretical guarantees under exchangeability and extensive empirical validation on MNIST and CIFAR-100 Super-class showing tighter, reliable uncertainty sets compared to baselines. Overall, unSENN enables reliable, distribution-free uncertainty quantification for both explanations and predictions, enhancing trust and applicability of self-explaining models in risk-sensitive domains.

Abstract

Despite the recent progress in deep neural networks (DNNs), it remains challenging to explain the predictions made by DNNs. Existing explanation methods for DNNs mainly focus on post-hoc explanations where another explanatory model is employed to provide explanations. The fact that post-hoc methods can fail to reveal the actual original reasoning process of DNNs raises the need to build DNNs with built-in interpretability. Motivated by this, many self-explaining neural networks have been proposed to generate not only accurate predictions but also clear and intuitive insights into why a particular decision was made. However, existing self-explaining networks are limited in providing distribution-free uncertainty quantification for the two simultaneously generated prediction outcomes (i.e., a sample's final prediction and its corresponding explanations for interpreting that prediction). Importantly, they also fail to establish a connection between the confidence values assigned to the generated explanations in the interpretation layer and those allocated to the final predictions in the ultimate prediction layer. To tackle the aforementioned challenges, in this paper, we design a novel uncertainty modeling framework for self-explaining networks, which not only demonstrates strong distribution-free uncertainty modeling performance for the generated explanations in the interpretation layer but also excels in producing efficient and effective prediction sets for the final predictions based on the informative high-level basis explanations. We perform the theoretical analysis for the proposed framework. Extensive experimental evaluation demonstrates the effectiveness of the proposed uncertainty framework.
Paper Structure (15 sections, 3 theorems, 23 equations, 7 figures, 6 tables, 2 algorithms)

This paper contains 15 sections, 3 theorems, 23 equations, 7 figures, 6 tables, 2 algorithms.

Key Result

Theorem 1

Suppose that the calibration samples ($D^{cal}=\{(x_i,c_{i}, y_i)\}_{i=1}^{N^{cal}}$) and the given test sample $x^{test}$ are exchangeable. Then, if we calculate the quantile value $Q_{1-\varepsilon}$ and construct $\Gamma^{\varepsilon}_{cpt}(x^{test})$ as indicated above, for the above non-conform where $\mathcal{C}_{cpt} (x^{test})$ is the true relevant concept set for the given test sample $x^

Figures (7)

  • Figure 1: Data efficiency for concept bottleneck models with different learning strategies.
  • Figure 2: Concept conformal sets on MNIST and CIFAR-100 Super-class. Phrases in bold and underlined mean true concepts.
  • Figure 3: Bloxplot of prediction sets. We display the lengths of conformal sets that are not equal to 1 over all the test data.
  • Figure 4: Prediction conformal sets on CIFAR-100 Super-class. Phrases in bold and underlined mean true labels.
  • Figure 5: Distribution of error rate with different calibration sizes (splitting percentage of the original available data).
  • ...and 2 more figures

Theorems & Definitions (7)

  • Definition 1: Self-explaining Models
  • Definition 2: Data Exchangeability
  • Theorem 1
  • proof
  • Lemma 2: tibshirani2019conformal
  • Theorem 3
  • proof