Table of Contents
Fetching ...

LATA: Laplacian-Assisted Transductive Adaptation for Conformal Uncertainty in Medical VLMs

Behzad Bozorgtabar, Dwarikanath Mahapatra, Sudipta Roy, Muzammal Naseer, Imran Razzak, Zongyuan Ge

TL;DR

The paper tackles unreliable uncertainty in medical vision–language models under domain shift by preserving finite-sample SCP guarantees while improving efficiency and class balance. It introduces LATA, a training- and label-free transductive refinement that smooths zero-shot posteriors over a joint calibration/test $k$NN graph using a CCCP mean-field update, and augments conformal scoring with a failure-aware head from ViLU to produce tighter, more balanced prediction sets at fixed coverage. A label-prior option (LATA-LI) further tunes coverage with minimal cost, while maintaining exchangeability. Across three medical VLMs and nine tasks, LATA consistently reduces set size and CCV, outperforming prior transductive baselines and approaching label-using performance with far lower compute, thereby enabling more reliable deployment of medical VLMs. The work demonstrates that deterministic, black-box refinements can meaningfully improve uncertainty quantification without retraining, broadening the practical impact of conformal uncertainty in clinical imaging settings.

Abstract

Medical vision-language models (VLMs) are strong zero-shot recognizers for medical imaging, but their reliability under domain shift hinges on calibrated uncertainty with guarantees. Split conformal prediction (SCP) offers finite-sample coverage, yet prediction sets often become large (low efficiency) and class-wise coverage unbalanced-high class-conditioned coverage gap (CCV), especially in few-shot, imbalanced regimes; moreover, naively adapting to calibration labels breaks exchangeability and voids guarantees. We propose \texttt{\textbf{LATA}} (Laplacian-Assisted Transductive Adaptation), a \textit{training- and label-free} refinement that operates on the joint calibration and test pool by smoothing zero-shot probabilities over an image-image k-NN graph using a small number of CCCP mean-field updates, preserving SCP validity via a deterministic transform. We further introduce a \textit{failure-aware} conformal score that plugs into the vision-language uncertainty (ViLU) framework, providing instance-level difficulty and label plausibility to improve prediction set efficiency and class-wise balance at fixed coverage. \texttt{\textbf{LATA}} is black-box (no VLM updates), compute-light (windowed transduction, no backprop), and includes an optional prior knob that can run strictly label-free or, if desired, in a label-informed variant using calibration marginals once. Across \textbf{three} medical VLMs and \textbf{nine} downstream tasks, \texttt{\textbf{LATA}} consistently reduces set size and CCV while matching or tightening target coverage, outperforming prior transductive baselines and narrowing the gap to label-using methods, while using far less compute. Comprehensive ablations and qualitative analyses show that \texttt{\textbf{LATA}} sharpens zero-shot predictions without compromising exchangeability.

LATA: Laplacian-Assisted Transductive Adaptation for Conformal Uncertainty in Medical VLMs

TL;DR

The paper tackles unreliable uncertainty in medical vision–language models under domain shift by preserving finite-sample SCP guarantees while improving efficiency and class balance. It introduces LATA, a training- and label-free transductive refinement that smooths zero-shot posteriors over a joint calibration/test NN graph using a CCCP mean-field update, and augments conformal scoring with a failure-aware head from ViLU to produce tighter, more balanced prediction sets at fixed coverage. A label-prior option (LATA-LI) further tunes coverage with minimal cost, while maintaining exchangeability. Across three medical VLMs and nine tasks, LATA consistently reduces set size and CCV, outperforming prior transductive baselines and approaching label-using performance with far lower compute, thereby enabling more reliable deployment of medical VLMs. The work demonstrates that deterministic, black-box refinements can meaningfully improve uncertainty quantification without retraining, broadening the practical impact of conformal uncertainty in clinical imaging settings.

Abstract

Medical vision-language models (VLMs) are strong zero-shot recognizers for medical imaging, but their reliability under domain shift hinges on calibrated uncertainty with guarantees. Split conformal prediction (SCP) offers finite-sample coverage, yet prediction sets often become large (low efficiency) and class-wise coverage unbalanced-high class-conditioned coverage gap (CCV), especially in few-shot, imbalanced regimes; moreover, naively adapting to calibration labels breaks exchangeability and voids guarantees. We propose \texttt{\textbf{LATA}} (Laplacian-Assisted Transductive Adaptation), a \textit{training- and label-free} refinement that operates on the joint calibration and test pool by smoothing zero-shot probabilities over an image-image k-NN graph using a small number of CCCP mean-field updates, preserving SCP validity via a deterministic transform. We further introduce a \textit{failure-aware} conformal score that plugs into the vision-language uncertainty (ViLU) framework, providing instance-level difficulty and label plausibility to improve prediction set efficiency and class-wise balance at fixed coverage. \texttt{\textbf{LATA}} is black-box (no VLM updates), compute-light (windowed transduction, no backprop), and includes an optional prior knob that can run strictly label-free or, if desired, in a label-informed variant using calibration marginals once. Across \textbf{three} medical VLMs and \textbf{nine} downstream tasks, \texttt{\textbf{LATA}} consistently reduces set size and CCV while matching or tightening target coverage, outperforming prior transductive baselines and narrowing the gap to label-using methods, while using far less compute. Comprehensive ablations and qualitative analyses show that \texttt{\textbf{LATA}} sharpens zero-shot predictions without compromising exchangeability.
Paper Structure (34 sections, 16 equations, 9 figures, 12 tables)

This paper contains 34 sections, 16 equations, 9 figures, 12 tables.

Figures (9)

  • Figure 1: LATA pipeline and coverage–efficiency trade-off. (a) LATA pipeline. Frozen vision/text encoders yield zero-shot scores $q(x)$, optionally adjusted via calibration-informed priors. LATA then refines predictions on the joint unlabeled pool $\mathcal{U}$ using a sparse $k$NN graph and CCCP updates, producing $\tilde{z}(x)$. A frozen ViLU module estimates difficulty $u(x)$ and attention $\alpha(x)$, forming a failure-aware score $S^\star$, which is conformalized into calibrated prediction sets. (b) Coverage–efficiency frontier ($\alpha{=}0.10$, APS). LATA-LF ($\beta{=}0$) achieves SCP-level coverage with lower set size and CCV. LATA-LI ($\beta{=}0.2$) improves coverage further with minimal cost, outperforming SCA-T in both efficiency and balance.
  • Figure 2: SICAPv2 — coverage, efficiency, and set structure (LAC, $\alpha{=}0.10$). (a)Left: Coverage–efficiency trade-off as calibration shots increase ($K\!\in\!\{4,8,16\}$; dot size encodes $K$). LATA (ours) defines the best label-free frontier—achieving smaller sets with equal or better coverage, approaching FCA without using labels at transfer. Right: Test-time coverage on $K{=}16$ splits (same seeds). Adapt+SCP under-covers; SCP/SCA-T reduce violations but remain dispersed; LATA concentrates near/above nominal with the lowest CCV. (b) Qualitative results at $K{=}16$: per-class set-size distributions (left) and label co-occurrence (right). LATA focuses uncertainty on adjacent grades (G3–G4), reducing CCV while preserving coverage.
  • Figure 3: Ablations on shots and window size (LAC, $\alpha{=}0.10$; averages across tasks). (a) Effect of calibration shots $K$. (b) Effect of query/window size $W$ (dashed line marks the full-batch limit). Across both sweeps, LATA achieves smaller prediction sets and lower CCV than baselines while preserving nominal coverage.
  • Figure 4: Exchangeability and per-dataset $\Delta$accuracy–$\Delta$set-size (APS, $\alpha{=}0.10$). (a) Across datasets, LATA-LI yields $\Delta\text{Accuracy}{>}0$ and $\Delta\text{Set Size}{<}0$ vs. SCP, with a weak linear fit (small $R^2$), indicating efficiency gains are not merely due to accuracy increases. (b) Exchangeability sanity check: the invalid Probe@cal + SCP@same under-covers, while LATA-LF (shared, label-free transform) stays near the nominal $(1{-}\alpha)$ across random seed trials.
  • Figure S1: Compute–accuracy trade-off for LATA at $\alpha{=}0.10$. Time per image (x-axis) vs. average set size (y-axis) for $T_{\text{iter}}\!\in\!\{4,8,12\}$. Colors denote LAC/APS/RAPS; marker size encodes CCV. Annotations show Cov., CCV, and GPU memory. Default $T_{\text{iter}}{=}8$ balances speed and reliability; $T_{\text{iter}}{=}4$ is faster with mild trade-offs; $T_{\text{iter}}{=}12$ yields limited gains.
  • ...and 4 more figures