Table of Contents
Fetching ...

ConfIC-RCA: Statistically Grounded Efficient Estimation of Segmentation Quality

Matias Cosarinsky, Ramiro Billot, Lucas Mansilla, Gabriel Jimenez, Nicolas Gaggión, Guanghui Fu, Tom Tirer, Enzo Ferrante

Abstract

Assessing the quality of automatic image segmentation is crucial in clinical practice, but often very challenging due to the limited availability of ground truth annotations. Reverse Classification Accuracy (RCA) is an approach that estimates the quality of new predictions on unseen samples by training a segmenter on those predictions, and then evaluating it against existing annotated images. In this work we introduce ConfIC-RCA (Conformal In-Context RCA), a novel method for automatically estimating segmentation quality with statistical guarantees in the absence of ground-truth annotations, which consists of two main innovations. First, In-Context RCA, which leverages recent in-context learning models for image segmentation and incorporates retrieval-augmentation techniques to select the most relevant reference images. This approach enables efficient quality estimation with minimal reference data while avoiding the need of training additional models. Second, we introduce Conformal RCA, which extends both the original RCA framework and In-Context RCA to go beyond point estimation. Using tools from split conformal prediction, Conformal RCA produces prediction intervals for segmentation quality providing statistical guarantees that the true score lies within the estimated interval with a user-specified probability. Validated across 10 different medical imaging tasks in various organs and modalities, our methods demonstrate robust performance and computational efficiency, offering a promising solution for automated quality control in clinical workflows, where fast and reliable segmentation assessment is essential. The code is available at https://github.com/mcosarinsky/Conformal-In-Context-RCA

ConfIC-RCA: Statistically Grounded Efficient Estimation of Segmentation Quality

Abstract

Assessing the quality of automatic image segmentation is crucial in clinical practice, but often very challenging due to the limited availability of ground truth annotations. Reverse Classification Accuracy (RCA) is an approach that estimates the quality of new predictions on unseen samples by training a segmenter on those predictions, and then evaluating it against existing annotated images. In this work we introduce ConfIC-RCA (Conformal In-Context RCA), a novel method for automatically estimating segmentation quality with statistical guarantees in the absence of ground-truth annotations, which consists of two main innovations. First, In-Context RCA, which leverages recent in-context learning models for image segmentation and incorporates retrieval-augmentation techniques to select the most relevant reference images. This approach enables efficient quality estimation with minimal reference data while avoiding the need of training additional models. Second, we introduce Conformal RCA, which extends both the original RCA framework and In-Context RCA to go beyond point estimation. Using tools from split conformal prediction, Conformal RCA produces prediction intervals for segmentation quality providing statistical guarantees that the true score lies within the estimated interval with a user-specified probability. Validated across 10 different medical imaging tasks in various organs and modalities, our methods demonstrate robust performance and computational efficiency, offering a promising solution for automated quality control in clinical workflows, where fast and reliable segmentation assessment is essential. The code is available at https://github.com/mcosarinsky/Conformal-In-Context-RCA

Paper Structure

This paper contains 14 sections, 1 theorem, 7 equations, 9 figures, 3 tables.

Key Result

Theorem 1

Suppose that $\{(X_i, Y_i)\}_{i=1}^n$ and $(X_{n+1}, Y_{n+1})$ are i.i.d. Let $\hat{q}$ and $C_\alpha(X_{n+1})$ be as defined above. Then the following holds:

Figures (9)

  • Figure 1: General outline of the In-Context RCA framework. The quality of a predicted segmentation $S_I$ on an image $I$ is estimated by leveraging $(I, S_I)$ as the support set of an In-Context Classifier or segmenter. This classifier is then applied to segment a reference dataset, selected through retrieval augmentation. Finally, the best segmentation result based on an evaluation metric $\rho$, is used to predict the quality of $S_I$ following Eq. \ref{['eq:RCA-estimate']}.
  • Figure 2: Datasets overview including modality, number of images and structure.
  • Figure 3: Scatter plots comparing predicted vs real DSC on 3D-IRCAdB (Left) and JSRT (Right) for different reference datasets of size $k$ using UniverSeg as the reverse classifier. Top: results using randomly selected reference samples. Bottom: results following retrieval-augmentation, where the $k$ most similar images are selected based on cosine similarity in the DINOv2 embedding space. In both cases retrieval-augmentation achieves better performance with fewer samples, as shown by higher correlation and lower MAE (bold indicates the best value).
  • Figure 4: Scatter plots comparing the predicted DSC across all datasets for In-Context RCA using UniverSeg and SAM 2 as reverse classifiers, followed by the retrieval-augmented and traditional Atlas RCA versions. All evaluations were conducted with a reference dataset of size 8. Bold values indicate the best performing methods exhibiting significant differences with respect to the non-bold values ($p < 0.01$, paired test), but no significant differences among themselves. In all cases 95% confidence intervals obtained via bootstrapping.
  • Figure 5: Sensitivity analysis of conformal quantile selection for Conformal-RCA using SAM 2 as the reverse segmenter. Each point corresponds to a $(p_{\ell}, p_h)$ quantile pair, evaluated by its average empirical coverage and average relative prediction interval width across all datasets. Interval widths are normalized per dataset to ensure equal weighting across tasks. The Pareto frontier highlights quantile pairs offering optimal coverage–efficiency trade-offs.
  • ...and 4 more figures

Theorems & Definitions (1)

  • Theorem 1