Table of Contents
Fetching ...

Distribution-Free Selection of Low-Risk Oncology Patients for Survival Beyond a Time Horizon

Matteo Sesia, Vladimir Svetnik

TL;DR

The results reveal a trade-off between efficiency and strength of guarantees: FDR-based screening is typically more powerful, while LTT-based calibration is more conservative but offers stronger guarantees.

Abstract

We study the problem of selecting a subset of patients who are unlikely to experience an adverse event within a fixed time horizon by calibrating a screening rule based on a black-box survival model. We consider two complementary, distribution-free frameworks for this task. The first extends classical calibration ideas -- estimating the event rate among selected patients using a hold-out dataset -- by integrating them with the Learn-Then-Test (LTT) framework, yielding high-probability guarantees for data-adaptively tuned screening rules. The second takes a different perspective by reformulating screening as a hypothesis testing problem on future patient outcomes, enabling false discovery rate (FDR) control via the Benjamini-Hochberg procedure applied to selective conformal p-values, and providing guarantees in expectation. We clarify the theoretical relationship between these approaches, explain how both can be adapted to right-censored time-to-event data via inverse probability of censoring weighting, and compare them empirically using simulations and oncology data from the Flatiron Health Research Database. Our results reveal a trade-off between efficiency and strength of guarantees: FDR-based screening is typically more powerful, while LTT-based calibration is more conservative but offers stronger guarantees. We also provide practical guidance on implementation and tuning.

Distribution-Free Selection of Low-Risk Oncology Patients for Survival Beyond a Time Horizon

TL;DR

The results reveal a trade-off between efficiency and strength of guarantees: FDR-based screening is typically more powerful, while LTT-based calibration is more conservative but offers stronger guarantees.

Abstract

We study the problem of selecting a subset of patients who are unlikely to experience an adverse event within a fixed time horizon by calibrating a screening rule based on a black-box survival model. We consider two complementary, distribution-free frameworks for this task. The first extends classical calibration ideas -- estimating the event rate among selected patients using a hold-out dataset -- by integrating them with the Learn-Then-Test (LTT) framework, yielding high-probability guarantees for data-adaptively tuned screening rules. The second takes a different perspective by reformulating screening as a hypothesis testing problem on future patient outcomes, enabling false discovery rate (FDR) control via the Benjamini-Hochberg procedure applied to selective conformal p-values, and providing guarantees in expectation. We clarify the theoretical relationship between these approaches, explain how both can be adapted to right-censored time-to-event data via inverse probability of censoring weighting, and compare them empirically using simulations and oncology data from the Flatiron Health Research Database. Our results reveal a trade-off between efficiency and strength of guarantees: FDR-based screening is typically more powerful, while LTT-based calibration is more conservative but offers stronger guarantees. We also provide practical guidance on implementation and tuning.

Paper Structure

This paper contains 39 sections, 2 theorems, 62 equations, 13 figures, 8 tables, 5 algorithms.

Key Result

Theorem 1

Define $H^{0}_{n+1}, \ldots, H^{0}_{n+m}$ as in eq:null-hyp and let $\hat{p}(X_{n+1}),\ldots,\hat{p}(X_{n+m})$ be selective conformal $p$-values satisfying eq:sel-pvals-superunif. Apply the BH procedure at level $\alpha \in (0,1)$ to these statistics, and let $\widehat{\mathcal{S}} \subseteq [m]$ de

Figures (13)

  • Figure 1: Summary of low-risk screening results obtained with different calibration methods on semi-synthetic data, at different screening horizons. Top row: yield. Second row: survival rate among selected patients. Third row: survival rate among selected patients, conditional on at least one patient being selected. Conditional results are not evaluated if selections occur in fewer than 10% of experiments. The dashed horizontal line denotes the target survival rate (e.g., 90%). Fourth row: proportion of experiments in which at least one patient is selected. All methods use the same survival and censoring models based on random forests. High-probability (HP) methods are applied at confidence level $\delta=0.1$.
  • Figure 2: Summary of low-risk screening results on semi-synthetic data, as in Figure \ref{['fig:1']}. Here, HP-LTT is applied at different confidence levels $\delta$, ranging from $\delta = 0.05$ (more conservative) to $\delta=0.5$ (more liberal).
  • Figure 3: Summary of low-risk screening results obtained by applying different calibration methods to semi-synthetic data simulated using a random forest generative model, at different screening horizons. All methods use the same mis-specified gradient boosting survival model, which leads to lower-than-expected survival rates at long horizons if applied to select low-risk patients without calibration. Other details as in Figure \ref{['fig:1']}.
  • Figure 4: Average performance of low-risk screening methods on semi-synthetic data, as in Figure \ref{['fig:1']}, at horizon $t_0=2$ months. The results are shown as a function of the calibration sample size, for different survival models. Error bars represent two standard errors.
  • Figure A1: Summary of low-risk screening results obtained by applying different calibration methods to semi-synthetic oncology data simulated using a random forest generative model, at different screening horizons. All methods use the same mis-specified Cox survival model. Other details as in Figure \ref{['fig:1']}.
  • ...and 8 more figures

Theorems & Definitions (3)

  • Theorem 1: jin2023selection
  • Theorem 2
  • proof : Proof of Theorem \ref{['thm:fdr-risk']}