Statistical Management of the False Discovery Rate in Medical Instance Segmentation Based on Conformal Risk Control
Mengxia Dai, Wenqian Luo, Tianyang Li
TL;DR
This work tackles the challenge of confidence calibration in medical instance segmentation by integrating conformal prediction with a calibration-aware loss to dynamically adjust segmentation thresholds at a user-defined risk level $α$. The proposed method is model-agnostic and compatible with common architectures like Mask R-CNN and BlendMask, requiring only a small exchangeable calibration set to guarantee that the expected loss on new data does not exceed $α$. The authors derive theoretical guarantees for FDR and FNR control, and validate robustness across varying calibration-to-test data ratios on a brain tumor MRI dataset, demonstrating controllable risk with practical computational efficiency. The approach enhances trustworthiness in high-stakes medical imaging, enabling reliable deployment of segmentation models in clinical workflows. Potential extensions include multi-instance segmentation and application to other medical imaging modalities.
Abstract
Instance segmentation plays a pivotal role in medical image analysis by enabling precise localization and delineation of lesions, tumors, and anatomical structures. Although deep learning models such as Mask R-CNN and BlendMask have achieved remarkable progress, their application in high-risk medical scenarios remains constrained by confidence calibration issues, which may lead to misdiagnosis. To address this challenge, we propose a robust quality control framework based on conformal prediction theory. This framework innovatively constructs a risk-aware dynamic threshold mechanism that adaptively adjusts segmentation decision boundaries according to clinical requirements.Specifically, we design a \textbf{calibration-aware loss function} that dynamically tunes the segmentation threshold based on a user-defined risk level $α$. Utilizing exchangeable calibration data, this method ensures that the expected FNR or FDR on test data remains below $α$ with high probability. The framework maintains compatibility with mainstream segmentation models (e.g., Mask R-CNN, BlendMask+ResNet-50-FPN) and datasets (PASCAL VOC format) without requiring architectural modifications. Empirical results demonstrate that we rigorously bound the FDR metric marginally over the test set via our developed calibration framework.
