Conformal Risk Control for Pulmonary Nodule Detection

Roel Hulsman; Valentin Comte; Lorenzo Bertolini; Tobias Wiesenthal; Antonio Puertas Gallardo; Mario Ceresa

Conformal Risk Control for Pulmonary Nodule Detection

Roel Hulsman, Valentin Comte, Lorenzo Bertolini, Tobias Wiesenthal, Antonio Puertas Gallardo, Mario Ceresa

TL;DR

The paper tackles the challenge of trustworthy uncertainty quantification in AI-assisted pulmonary nodule detection for lung cancer screening. It proposes conformal risk control (CRC) to wrap around a RetinaNet-based detector and generate prediction sets $C_\lambda(X)$ with formal false-negative rate guarantees, under an exchangeability assumption. Empirically, CRC achieves about 0.90 sensitivity while incurring additional false positives per scan, demonstrating a practical trade-off between guaranteed sensitivity and efficiency, and highlighting that off-the-shelf models require calibration to handle ontological uncertainty from radiologist disagreement. Overall, CRC provides statistically grounded, interpretable uncertainty guarantees that can support clinical decision-making and underscore the importance of calibration and consensus in deploying AI in healthcare.

Abstract

Quantitative tools are increasingly appealing for decision support in healthcare, driven by the growing capabilities of advanced AI systems. However, understanding the predictive uncertainties surrounding a tool's output is crucial for decision-makers to ensure reliable and transparent decisions. In this paper, we present a case study on pulmonary nodule detection for lung cancer screening, enhancing an advanced detection model with an uncertainty quantification technique called conformal risk control (CRC). We demonstrate that prediction sets with conformal guarantees are attractive measures of predictive uncertainty in the safety-critical healthcare domain, allowing end-users to achieve arbitrary validity by trading off false positives and providing formal statistical guarantees on model performance. Among ground-truth nodules annotated by at least three radiologists, our model achieves a sensitivity that is competitive with that generally achieved by individual radiologists, with a slight increase in false positives. Furthermore, we illustrate the risks of using off-the-shelve prediction models when faced with ontological uncertainty, such as when radiologists disagree on what constitutes the ground truth on pulmonary nodules.

Conformal Risk Control for Pulmonary Nodule Detection

TL;DR

with formal false-negative rate guarantees, under an exchangeability assumption. Empirically, CRC achieves about 0.90 sensitivity while incurring additional false positives per scan, demonstrating a practical trade-off between guaranteed sensitivity and efficiency, and highlighting that off-the-shelf models require calibration to handle ontological uncertainty from radiologist disagreement. Overall, CRC provides statistically grounded, interpretable uncertainty guarantees that can support clinical decision-making and underscore the importance of calibration and consensus in deploying AI in healthcare.

Abstract

Paper Structure (13 sections, 6 equations, 3 figures, 3 tables)

This paper contains 13 sections, 6 equations, 3 figures, 3 tables.

Introduction
Summary & Outline
Related Work
Methods
FROC Analysis
Conformal Risk Control
Pairing Procedure
LIDC-IDRI Dataset
Prediction Model
Experiments
Evaluation
Discussion
Additional Experiments

Figures (3)

Figure 1: A plot of the average sensitivity per scan against the average number of false positives per scan, measured on the test data of a random split of Set $r$ ($r\in\{1,2,3,4\}$). The colored dots highlight the confidence thresholds estimated.
Figure 2: Empirical histograms of the performance metrics evaluating the strategies to estimate $\widehat{\lambda}$, measured on the test set over $R=10,000$ random splits of Set $r$ ($r\in\{1,2,3,4\}$) into calibration and test set. Bar heights sum to one.
Figure 3: Empirical histograms of performance metrics of strategies to estimate $\widehat{\lambda}$, evaluated on the test set over $R=1,000$ random splits of Set $r$ ($r\in\{1,2,3,4\}$) into calibration and test set. Bar heights sum to one.

Conformal Risk Control for Pulmonary Nodule Detection

TL;DR

Abstract

Conformal Risk Control for Pulmonary Nodule Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (3)