CONFINE: Conformal Prediction for Interpretable Neural Networks

Linhui Huang; Sayeri Lala; Niraj K. Jha

CONFINE: Conformal Prediction for Interpretable Neural Networks

Linhui Huang, Sayeri Lala, Niraj K. Jha

TL;DR

CONFINE delivers a versatile conformal-prediction framework that wraps any pre-trained neural classifier to produce prediction sets with $p$-values, credibility, and confidence, alongside example-based explanations. By using a top-$k$ cosine-distance nonconformity measure on a selected hidden layer and calibration data, it provides class-wise uncertainty estimates and interpretable neighbor-based explanations without retraining. Empirically, CONFINE achieves accuracy improvements (up to 3.57% on PathMNIST), higher correct efficiency, and valid marginal and class-conditional coverage across diverse tasks including medical, vision, and NLP domains. The framework is broadly applicable but incurs additional computation and storage due to nearest-neighbor calculations, highlighting a trade-off between interpretability and efficiency that is acceptable in high-stakes settings requiring trustworthy predictions.

Abstract

Deep neural networks exhibit remarkable performance, yet their black-box nature limits their utility in fields like healthcare where interpretability is crucial. Existing explainability approaches often sacrifice accuracy and lack quantifiable measures of prediction uncertainty. In this study, we introduce Conformal Prediction for Interpretable Neural Networks (CONFINE), a versatile framework that generates prediction sets with statistically robust uncertainty estimates instead of point predictions to enhance model transparency and reliability. CONFINE not only provides example-based explanations and confidence estimates for individual predictions but also boosts accuracy by up to 3.6%. We define a new metric, correct efficiency, to evaluate the fraction of prediction sets that contain precisely the correct label and show that CONFINE achieves correct efficiency of up to 3.3% higher than the original accuracy, matching or exceeding prior methods. CONFINE's marginal and class-conditional coverages attest to its validity across tasks spanning medical image classification to language understanding. Being adaptable to any pre-trained classifier, CONFINE marks a significant advance towards transparent and trustworthy deep learning applications in critical domains.

CONFINE: Conformal Prediction for Interpretable Neural Networks

TL;DR

CONFINE delivers a versatile conformal-prediction framework that wraps any pre-trained neural classifier to produce prediction sets with

-values, credibility, and confidence, alongside example-based explanations. By using a top-

cosine-distance nonconformity measure on a selected hidden layer and calibration data, it provides class-wise uncertainty estimates and interpretable neighbor-based explanations without retraining. Empirically, CONFINE achieves accuracy improvements (up to 3.57% on PathMNIST), higher correct efficiency, and valid marginal and class-conditional coverage across diverse tasks including medical, vision, and NLP domains. The framework is broadly applicable but incurs additional computation and storage due to nearest-neighbor calculations, highlighting a trade-off between interpretability and efficiency that is acceptable in high-stakes settings requiring trustworthy predictions.

Abstract

Paper Structure (31 sections, 13 equations, 12 figures, 3 tables, 1 algorithm)

This paper contains 31 sections, 13 equations, 12 figures, 3 tables, 1 algorithm.

Introduction
Related Works
Explainable AI (XAI)
Uncertainty Estimation
Conformal Prediction
The CONFINE Framework
Review of Conformal Prediction
The CONFINE Algorithm
Experimental Setup
Tasks and Baselines
Evaluation Metrics
Experimental Results
CONFINE Provides Interpretability
CONFINE Boosts Accuracy
CONFINE Boosts Correct Efficiency
...and 16 more sections

Figures (12)

Figure 1: CONFINE applies conformal prediction to enhance the interpretability of any neural network classifier by providing prediction sets with confidence estimates and example-based explanations.
Figure 2: Two test samples from PathMNIST: (A) correctly classified by CONFINE and (B) incorrectly classified by CONFINE. (C) $p$-values for the test sample in (B). Classes with $p$-values above the significance level $\varepsilon$ are included in the prediction set. Hyperparameters used: $k=20, l=50$. cas: cancer-associated stroma, sm: smooth muscle, cae: colorectal adenocarcinoma epithelium, ncm: normal colon mucosa.
Figure 3: CONFINE's coverage and correct efficiency curves when changing allowed error rate $\varepsilon$. Coverage being above the diagonal line means that the coverage follows the allowed error rate and conformal prediction is valid. Hyperparameters used: (A) $T = 0.01, k = 5$, (B) $k = 5$, (C) $T = 0.01, k = 5$, (D) $k = 20$.
Figure 4: Coverage and correct efficiency curves when changing the significance level $\varepsilon$ for three prior conformal prediction methods. Hyperparameter used for (B), (E): $\gamma=1$.
Figure 5: Coverage curve of PathMNIST after random shuffling and re-splitting suggests exchangeability.
...and 7 more figures

CONFINE: Conformal Prediction for Interpretable Neural Networks

TL;DR

Abstract

CONFINE: Conformal Prediction for Interpretable Neural Networks

Authors

TL;DR

Abstract

Table of Contents

Figures (12)