Table of Contents
Fetching ...

Toward Conditional Distribution Calibration in Survival Prediction

Shi-ang Qi, Yakun Yu, Russell Greiner

TL;DR

This work addresses calibration in survival analysis under censoring by proposing CiPOT, a post-processing framework that uses the predicted survival probability at the observed time (iPOT) as a conformity score to produce conformalized ISDs. CiPOT achieves asymptotic marginal and conditional distribution calibration while preserving the ISD’s monotonicity and, under certain conditions, time-dependent discrimination measures such as AUROC and $C^{td}$. The method extends to censored data via a principled sampling scheme that respects the censoring mechanism and heteroskedasticity of the ISD. Empirical evaluation across 15 real-world datasets shows substantial improvements in both marginal and conditional calibration with competitive discrimination, and ablation studies illuminate the effects of repetition, percentile choice, and computational trade-offs. Overall, CiPOT provides a practical, scalable approach to reliable survival predictions with calibrated uncertainties suitable for individual decision-making and resource allocation.

Abstract

Survival prediction often involves estimating the time-to-event distribution from censored datasets. Previous approaches have focused on enhancing discrimination and marginal calibration. In this paper, we highlight the significance of conditional calibration for real-world applications -- especially its role in individual decision-making. We propose a method based on conformal prediction that uses the model's predicted individual survival probability at that instance's observed time. This method effectively improves the model's marginal and conditional calibration, without compromising discrimination. We provide asymptotic theoretical guarantees for both marginal and conditional calibration and test it extensively across 15 diverse real-world datasets, demonstrating the method's practical effectiveness and versatility in various settings.

Toward Conditional Distribution Calibration in Survival Prediction

TL;DR

This work addresses calibration in survival analysis under censoring by proposing CiPOT, a post-processing framework that uses the predicted survival probability at the observed time (iPOT) as a conformity score to produce conformalized ISDs. CiPOT achieves asymptotic marginal and conditional distribution calibration while preserving the ISD’s monotonicity and, under certain conditions, time-dependent discrimination measures such as AUROC and . The method extends to censored data via a principled sampling scheme that respects the censoring mechanism and heteroskedasticity of the ISD. Empirical evaluation across 15 real-world datasets shows substantial improvements in both marginal and conditional calibration with competitive discrimination, and ablation studies illuminate the effects of repetition, percentile choice, and computational trade-offs. Overall, CiPOT provides a practical, scalable approach to reliable survival predictions with calibrated uncertainties suitable for individual decision-making and resource allocation.

Abstract

Survival prediction often involves estimating the time-to-event distribution from censored datasets. Previous approaches have focused on enhancing discrimination and marginal calibration. In this paper, we highlight the significance of conditional calibration for real-world applications -- especially its role in individual decision-making. We propose a method based on conformal prediction that uses the model's predicted individual survival probability at that instance's observed time. This method effectively improves the model's marginal and conditional calibration, without compromising discrimination. We provide asymptotic theoretical guarantees for both marginal and conditional calibration and test it extensively across 15 diverse real-world datasets, demonstrating the method's practical effectiveness and versatility in various settings.

Paper Structure

This paper contains 60 sections, 8 theorems, 35 equations, 23 figures, 3 tables, 1 algorithm.

Key Result

Theorem 3.1

If the instances in $\mathcal{D}$ are exchangeable, and follow the conditional independent censoring assumption, then for a new instance $n+1$, $\forall \ \rho_1 < \rho_2 \in [0, 1]$,

Figures (23)

  • Figure 1: Two notions of distribution calibration: marginal and conditional, illustrated using 3 bins separated at $\frac{1}{3}$ and $\frac{2}{3}$. The curves in (a, d) represent the predicted ISDs. The colors of the stars distinguish the six subjects, with horizontal coordinates indicating the true event time (consistent across all panels) and vertical coordinates representing predicted survival probability at event time. Note the two groups (orange for $x=0$ and blue for $x=1$) correspond to the colors of the curves and histograms in (a, b, d, e). Note that all three P-P lines in the conditional case (f) coincide.
  • Figure 2: A visual example of using CiPOT to make the prediction (conditionally)-calibrated. (a) Initialize ISD predictions from an arbitrary survival algorithm with associated (b) histograms and (c) P-P plots. (d) Calculate $\text{Percentile} (\rho; \, \Gamma_\mathcal{M})$ (grey lines) for all $\rho$s, and find the intersections (hollow points) of the ISD curves and the $\text{Percentile} (\rho; \, \Gamma_\mathcal{M})$ lines; (e) Generate new ISD by vertically shifting the hollow points to the $\rho$'s level, with associated (f) histogram and (g) P-P plots. Figure \ref{['fig:csd_compare']} provide a side-by-side visual comparison between CSD and our method.
  • Figure 3: Violin plots of C-index and $\text{Cal}_{\text{margin}}$ performance of our method (CiPOT) and benchmarks. The shape of each violin plot represents the probability density of the performance scores, with the black bar inside the violin indicating the mean performance. The red dashed lines in the lower panels represent the mean calibration performance for KM, serving as an empirical lower limit.
  • Figure 4: Violin plots of $\text{Cal}_{\text{ws}}$ performance, where the shape and black bars represent the density and mean. Smaller values represent better performance. Note CQRNN did not converge on MIMIC-IV.
  • Figure 5: Individual distribution calibration, illustrated using 3 bins separated at $\frac{1}{3}$ and $\frac{2}{3}$. The curve is an oracle's true ISD $S(t\mid \boldsymbol{x}_{i})$. The stars represent 15 realizations of $t\mid \boldsymbol{x}_{i}$. The vertical coordinates of the star represent the time and the horizontal coordinates of the star are the survival probability at observed time $S(e_i^{m} \mid \boldsymbol{x}_{i})$.
  • ...and 18 more figures

Theorems & Definitions (15)

  • Definition 2.1
  • Theorem 3.1: Asymptotic marginal calibration
  • Theorem 3.2: Asymptotic conditional calibration
  • Theorem 3.3
  • Definition A.1: Marginal calibration
  • Theorem C.1: Asymptotic marginal calibration
  • proof
  • Theorem C.2: Asymptotic conditional calibration
  • proof
  • Theorem C.3
  • ...and 5 more