Table of Contents
Fetching ...

COMPASS: Robust Feature Conformal Prediction for Medical Segmentation Metrics

Matt Y. Cheung, Ashok Veeraraghavan, Guha Balakrishnan

TL;DR

This work introduces COMPASS, a practical framework that generates efficient, metric-based CP intervals for image segmentation models by leveraging the inductive biases of their underlying deep neural networks and paves the way for practical, metric-based uncertainty quantification for medical image segmentation.

Abstract

In clinical applications, the utility of segmentation models is often based on the accuracy of derived downstream metrics such as organ size, rather than by the pixel-level accuracy of the segmentation masks themselves. Thus, uncertainty quantification for such metrics is crucial for decision-making. Conformal prediction (CP) is a popular framework to derive such principled uncertainty guarantees, but applying CP naively to the final scalar metric is inefficient because it treats the complex, non-linear segmentation-to-metric pipeline as a black box. We introduce COMPASS, a practical framework that generates efficient, metric-based CP intervals for image segmentation models by leveraging the inductive biases of their underlying deep neural networks. COMPASS performs calibration directly in the model's representation space by perturbing intermediate features along low-dimensional subspaces maximally sensitive to the target metric. We prove that COMPASS achieves valid marginal coverage under the assumption of exchangeability. Empirically, we demonstrate that COMPASS produces significantly tighter intervals than traditional CP baselines on four medical image segmentation tasks for area estimation of skin lesions and anatomical structures. Furthermore, we show that leveraging learned internal features to estimate importance weights allows COMPASS to also recover target coverage under covariate shifts. COMPASS paves the way for practical, metric-based uncertainty quantification for medical image segmentation.

COMPASS: Robust Feature Conformal Prediction for Medical Segmentation Metrics

TL;DR

This work introduces COMPASS, a practical framework that generates efficient, metric-based CP intervals for image segmentation models by leveraging the inductive biases of their underlying deep neural networks and paves the way for practical, metric-based uncertainty quantification for medical image segmentation.

Abstract

In clinical applications, the utility of segmentation models is often based on the accuracy of derived downstream metrics such as organ size, rather than by the pixel-level accuracy of the segmentation masks themselves. Thus, uncertainty quantification for such metrics is crucial for decision-making. Conformal prediction (CP) is a popular framework to derive such principled uncertainty guarantees, but applying CP naively to the final scalar metric is inefficient because it treats the complex, non-linear segmentation-to-metric pipeline as a black box. We introduce COMPASS, a practical framework that generates efficient, metric-based CP intervals for image segmentation models by leveraging the inductive biases of their underlying deep neural networks. COMPASS performs calibration directly in the model's representation space by perturbing intermediate features along low-dimensional subspaces maximally sensitive to the target metric. We prove that COMPASS achieves valid marginal coverage under the assumption of exchangeability. Empirically, we demonstrate that COMPASS produces significantly tighter intervals than traditional CP baselines on four medical image segmentation tasks for area estimation of skin lesions and anatomical structures. Furthermore, we show that leveraging learned internal features to estimate importance weights allows COMPASS to also recover target coverage under covariate shifts. COMPASS paves the way for practical, metric-based uncertainty quantification for medical image segmentation.

Paper Structure

This paper contains 37 sections, 8 theorems, 31 equations, 20 figures, 8 tables, 6 algorithms.

Key Result

Theorem 1

Let $(X_i,Y_i)_{i\ge 1}$ be exchangeable random pairs with $X_i\in\mathcal{X}$ and $Y_i\in\mathbb R$, and split the data into a training set $D_{\mathrm{tr}}$ and a calibration set $D_{\mathrm{cal}} = \{(X_i,Y_i)\}_{i=1}^n$. Using $D_{\mathrm{tr}}$, fit a segmentation model with decoder $g:\mathcal{ We define the prediction set $S_\beta(x)$ as the range of the metric function over the perturbation

Figures (20)

  • Figure 1: Overview of COMPASS. (Left) A medical image segmentation network predicts a segmentation map" from an input image. We conceptually decompose this network into a function $f$ which maps the image to latent features $\hat{z}$, and a function $g$ that maps $\hat{z}$ to the output map. The map may then be used to compute a (differentiable) downstream metric $\hat{y}$ via the function $h$. (Center) We linearly perturb calibration features $\hat{z_i}$ in a sample-specific direction $\Delta_i$ to find the scores $R_i$. The scores are used to find the conformal quantile $\hat{\beta}$. (Right) At test time for subject $n+1$, we perturb the features $\hat{z}_{n+1}$ in the direction $\Delta_{n+1}$ with magnitude $\hat{\beta}$. By Theorem 1, our interval construction is guaranteed to be nested. Therefore, under the assumption of exchangeability, the resulting prediction interval achieves marginal coverage (bottom).
  • Figure 2: Visual verification of monotonicity to justify Endpoint-COMPASS. As latent features are shifted along the COMPASS-J direction $\Delta$, the induced segmentation volumes (red contours) monotonically expand. This is the key justification for using our efficient Endpoint-COMPASS in our experimental setup, as it demonstrates mathematical equivalence with the rigorous Envelope-COMPASS. We show a sample from each dataset with perturbation magnitudes $\beta$ targeted at -20%, -10%, 0% (original prediction), +10%, and +20% change in area ($\delta A$). We provide a plot of all metric responses on the testing datasets in Figure \ref{['fig:monotonic']} and more visual examples in Appendix \ref{['app:additionalfigures']}.
  • Figure 3: COMPASS achieves the most efficient interval sizes under covariate shifts. We show results for two datasets and compare weighting methods for 100 adversarial splits that maintain the same covariate shift for $\alpha=0.1$. For H&E, we increased the proportion of "hard" samples in the test set. For Skin Lesion, we decreased the proportion of "hard" samples in the test set. We find that COMPASS methods achieve valid coverage and the most efficient intervals in each weighting method. We show the 95% confidence intervals.
  • Figure 4: The statistical efficiency of COMPASS is driven by a compressive power-law relationship between latent $R_{COMPASS}$ and output $R_{SCP}$ space scores. As $R_{SCP}$ increases, the required latent space perturbation magnitude increases, but at a progressively slower rate since the scaling exponent (slope) is $<1$ (Figure \ref{['fig:log_linear_plot']}, top). This concave and sub-linear scaling is the direct cause of a tail-end compression of the score distribution (bottom). Thus, the long-tail errors are systematically transformed to much smaller feature-space scores.
  • Figure 5: Explained variance is a good indicator of monotonicity. For our 4 datasets, we plot the first principal component's explained variance against the feature layer used for COMPASS-J. Monotonicity and non-monotonicity is indicated by $\bullet$ and $\bm \times$.
  • ...and 15 more figures

Theorems & Definitions (16)

  • Definition 1: Nestedness
  • Theorem 1: Split-Conformal Coverage under Linear Latent Perturbations
  • Proposition 1: Validity of Weighted COMPASS under Covariate Shift
  • proof : Proof Sketch
  • Lemma 1: Guaranteed Nestedness and Valid Scores
  • proof
  • Lemma 2: Exchangeability yields uniform ranks
  • proof
  • Theorem 2: Split-conformal coverage
  • proof
  • ...and 6 more