Table of Contents
Fetching ...

MedBayes-Lite: Bayesian Uncertainty Quantification for Safe Clinical Decision Support

Elias Hossain, Md Mehedi Hasan Nipu, Maleeha Sheikh, Rajib Rana, Subash Neupane, Niloofar Yousefi

TL;DR

MedBayes-Lite tackles the critical problem of safe clinical decision support by endowing transformer-based LLMs with calibrated, end-to-end uncertainty propagation without retraining. It introduces a lightweight Bayesian framework composed of Bayesian Embedding Calibration, Uncertainty-Weighted Attention, and Confidence-Guided Decision Shaping to downweight unreliable cues and abstain when confidence is insufficient. A key theoretical advance is the layer-wise uncertainty propagation (Theorem 5) that enables interpretable tracing of epistemic and aleatoric uncertainty through embeddings, attention, and decisions. Across MIMIC-III, MedQA, and PubMedQA, the approach reduces overconfidence by 32–48% and can prevent up to 41% of diagnostic errors in simulated settings, demonstrating practical improvements in reliability and interpretability for clinical AI systems.

Abstract

We propose MedBayes-Lite, a lightweight Bayesian enhancement for transformer-based clinical language models designed to produce reliable, uncertainty-aware predictions. Although transformers show strong potential for clinical decision support, they remain prone to overconfidence, especially in ambiguous medical cases where calibrated uncertainty is critical. MedBayes-Lite embeds uncertainty quantification directly into existing transformer pipelines without any retraining or architectural rewiring, adding no new trainable layers and keeping parameter overhead under 3 percent. The framework integrates three components: (i) Bayesian Embedding Calibration using Monte Carlo dropout for epistemic uncertainty, (ii) Uncertainty-Weighted Attention that marginalizes over token reliability, and (iii) Confidence-Guided Decision Shaping inspired by clinical risk minimization. Across biomedical QA and clinical prediction benchmarks (MedQA, PubMedQA, MIMIC-III), MedBayes-Lite consistently improves calibration and trustworthiness, reducing overconfidence by 32 to 48 percent. In simulated clinical settings, it can prevent up to 41 percent of diagnostic errors by flagging uncertain predictions for human review. These results demonstrate its effectiveness in enabling reliable uncertainty propagation and improving interpretability in medical AI systems.

MedBayes-Lite: Bayesian Uncertainty Quantification for Safe Clinical Decision Support

TL;DR

MedBayes-Lite tackles the critical problem of safe clinical decision support by endowing transformer-based LLMs with calibrated, end-to-end uncertainty propagation without retraining. It introduces a lightweight Bayesian framework composed of Bayesian Embedding Calibration, Uncertainty-Weighted Attention, and Confidence-Guided Decision Shaping to downweight unreliable cues and abstain when confidence is insufficient. A key theoretical advance is the layer-wise uncertainty propagation (Theorem 5) that enables interpretable tracing of epistemic and aleatoric uncertainty through embeddings, attention, and decisions. Across MIMIC-III, MedQA, and PubMedQA, the approach reduces overconfidence by 32–48% and can prevent up to 41% of diagnostic errors in simulated settings, demonstrating practical improvements in reliability and interpretability for clinical AI systems.

Abstract

We propose MedBayes-Lite, a lightweight Bayesian enhancement for transformer-based clinical language models designed to produce reliable, uncertainty-aware predictions. Although transformers show strong potential for clinical decision support, they remain prone to overconfidence, especially in ambiguous medical cases where calibrated uncertainty is critical. MedBayes-Lite embeds uncertainty quantification directly into existing transformer pipelines without any retraining or architectural rewiring, adding no new trainable layers and keeping parameter overhead under 3 percent. The framework integrates three components: (i) Bayesian Embedding Calibration using Monte Carlo dropout for epistemic uncertainty, (ii) Uncertainty-Weighted Attention that marginalizes over token reliability, and (iii) Confidence-Guided Decision Shaping inspired by clinical risk minimization. Across biomedical QA and clinical prediction benchmarks (MedQA, PubMedQA, MIMIC-III), MedBayes-Lite consistently improves calibration and trustworthiness, reducing overconfidence by 32 to 48 percent. In simulated clinical settings, it can prevent up to 41 percent of diagnostic errors by flagging uncertain predictions for human review. These results demonstrate its effectiveness in enabling reliable uncertainty propagation and improving interpretability in medical AI systems.

Paper Structure

This paper contains 20 sections, 5 theorems, 10 equations, 1 figure, 2 tables, 1 algorithm.

Key Result

Theorem 1

The posterior distribution of embeddings $p(h \mid x)$ can be estimated using MC dropout: where each dropout mask $m$ represents a variational draw from the posterior distribution over parameters.

Figures (1)

  • Figure 1: Overview of the proposed MedBayes-Lite framework. The model integrates Bayesian reasoning across embedding, attention, and decision layers to estimate both epistemic and aleatoric uncertainty. It combines uncertainty-weighted attention, adaptive evidence scoring, and confidence-guided decision shaping to enable efficient and risk-aware clinical inference.

Theorems & Definitions (10)

  • Theorem 1: Posterior Approximation
  • proof
  • Theorem 2: Uncertainty-Weighted Attention
  • proof
  • Theorem 3: Confidence-Guided Decision Shaping
  • proof
  • Theorem 4: Uncertainty Decomposition
  • proof
  • Theorem 5: Layer-wise Uncertainty Propagation in MedBayes-Lite
  • proof