Table of Contents
Fetching ...

Multimodal Posterior Sampling-based Uncertainty in PD-L1 Segmentation from H&E Images

Roman Kinakh, Gonzalo R. Ríos-Muñoz, Arrate Muñoz-Barrutia

TL;DR

The paper addresses the challenge of PD-L1 biomarker assessment by inferring PD-L1-expressing regions directly from H&E histology, bypassing resource-intensive IHC. It introduces nnUNet-B, a Bayesian segmentation framework that employs Multimodal Posterior Sampling (MPS) to sample diverse checkpoints along a cyclic training trajectory of nnUNet-v2, approximating the posterior and yielding pixel-wise uncertainty maps. Inference uses an ensemble of N models to produce averaged segmentation probabilities and uncertainty measures via entropy $H$ and standard deviation $\sigma$, enabling uncertainty-aware predictions. On 1,088 paired H&E–IHC images from lung squamous cell carcinoma, nnUNet-B achieves competitive metrics ($mDice=0.805$, $mIoU=0.709$, $mHD95=97$, $mPA=0.860$) with correlated but not perfectly calibrated uncertainty, suggesting uncertainty-aware, scalable biomarker inference is feasible for clinical workflows.

Abstract

Accurate assessment of PD-L1 expression is critical for guiding immunotherapy, yet current immunohistochemistry (IHC) based methods are resource-intensive. We present nnUNet-B: a Bayesian segmentation framework that infers PD-L1 expression directly from H&E-stained histology images using Multimodal Posterior Sampling (MPS). Built upon nnUNet-v2, our method samples diverse model checkpoints during cyclic training to approximate the posterior, enabling both accurate segmentation and epistemic uncertainty estimation via entropy and standard deviation. Evaluated on a dataset of lung squamous cell carcinoma, our approach achieves competitive performance against established baselines with mean Dice Score and mean IoU of 0.805 and 0.709, respectively, while providing pixel-wise uncertainty maps. Uncertainty estimates show strong correlation with segmentation error, though calibration remains imperfect. These results suggest that uncertainty-aware H&E-based PD-L1 prediction is a promising step toward scalable, interpretable biomarker assessment in clinical workflows.

Multimodal Posterior Sampling-based Uncertainty in PD-L1 Segmentation from H&E Images

TL;DR

The paper addresses the challenge of PD-L1 biomarker assessment by inferring PD-L1-expressing regions directly from H&E histology, bypassing resource-intensive IHC. It introduces nnUNet-B, a Bayesian segmentation framework that employs Multimodal Posterior Sampling (MPS) to sample diverse checkpoints along a cyclic training trajectory of nnUNet-v2, approximating the posterior and yielding pixel-wise uncertainty maps. Inference uses an ensemble of N models to produce averaged segmentation probabilities and uncertainty measures via entropy and standard deviation , enabling uncertainty-aware predictions. On 1,088 paired H&E–IHC images from lung squamous cell carcinoma, nnUNet-B achieves competitive metrics (, , , ) with correlated but not perfectly calibrated uncertainty, suggesting uncertainty-aware, scalable biomarker inference is feasible for clinical workflows.

Abstract

Accurate assessment of PD-L1 expression is critical for guiding immunotherapy, yet current immunohistochemistry (IHC) based methods are resource-intensive. We present nnUNet-B: a Bayesian segmentation framework that infers PD-L1 expression directly from H&E-stained histology images using Multimodal Posterior Sampling (MPS). Built upon nnUNet-v2, our method samples diverse model checkpoints during cyclic training to approximate the posterior, enabling both accurate segmentation and epistemic uncertainty estimation via entropy and standard deviation. Evaluated on a dataset of lung squamous cell carcinoma, our approach achieves competitive performance against established baselines with mean Dice Score and mean IoU of 0.805 and 0.709, respectively, while providing pixel-wise uncertainty maps. Uncertainty estimates show strong correlation with segmentation error, though calibration remains imperfect. These results suggest that uncertainty-aware H&E-based PD-L1 prediction is a promising step toward scalable, interpretable biomarker assessment in clinical workflows.

Paper Structure

This paper contains 9 sections, 3 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview of dataset annotation and segmentation workflow: (a) Pathologists annotate PD-L1-positive (green) and PD-L1-negative (red) tumor regions on IHC slides. (b) Annotations are converted into 3-class segmentation masks. (c) Masks are aligned with corresponding H&E images. (d) H&E images are used as model input, with predictions supervised by the aligned PD-L1 masks. deng2025mcranetwang2024prediction
  • Figure 2: Bayesian nnU-Net framework with Multimodal Posterior Sampling (MPS). During training, checkpoints are sampled from the last $n/3$ epochs of each learning cycle. At inference, an H&E image is passed through $n$ sampled models to generate probability maps, which are averaged and $\arg\max$-ed for prediction. Pixel-wise uncertainty is computed using entropy or standard deviation. zhao2022efficient
  • Figure 3: Training process of the Bayesian nnU-Net framework with Multimodal Posterior Sampling (MPS). The model undergoes three full training cycles using a cyclic learning rate schedule. During the final 20 epochs of each cycle, model checkpoints are sampled and stored to later be used as an ensemble for uncertainty estimation.
  • Figure 4: Visual summary of nnUNet-B predictions, error maps, and uncertainty estimates for two test images (top two rows: image 1; bottom two rows: image 2). For each image: Column 1 shows the H&E and corresponding IHC reference; Column 2 displays the model prediction and ground truth; Column 3 presents class-specific error maps for PD-L1-negative (NEG) and -positive (POS) regions; Columns 4 and 5 show standard deviation and entropy-based uncertainty maps, each overlaid on the H&E image.
  • Figure 5: Test dataset-level uncertainty calibration curves using (a) STD and (b) entropy. Each plot displays the relationship between predicted uncertainty and actual prediction error, computed across binned uncertainty intervals. The dashed diagonal line denotes perfect calibration, where predicted uncertainty would match the observed error.