Table of Contents
Fetching ...

Overcoming Dependent Censoring in the Evaluation of Survival Models

Christian Marius Lillelund, Shi-ang Qi, Russell Greiner

TL;DR

The paper tackles bias in survival-model evaluation caused by dependent censoring, where $T$ and $C$ are not independent. It introduces three copula-based metrics, CI-Dep, IBS-Dep, and MAE-Dep, that incorporate Archimedean copulas via the Copula-Graphic framework to account for the dependence between $T$ and $C$, replacing KM-based components with CG-based estimates. A semi-synthetic data-generation framework is also developed to realistically simulate dependent censoring and enable robust benchmarking. Empirical results on synthetic and semi-synthetic data show that the proposed metrics reduce bias and provide more reliable model error estimates under dependent censoring, with performance depending on the strength of dependence and censoring rate.

Abstract

Conventional survival metrics, such as Harrell's concordance index (CI) and the Brier Score, rely on the independent censoring assumption for valid inference with right-censored data. However, in the presence of so-called dependent censoring, where the probability of censoring is related to the event of interest, these metrics can give biased estimates of the underlying model error. In this paper, we introduce three new evaluation metrics for survival analysis based on Archimedean copulas that can account for dependent censoring. We also develop a framework to generate realistic, semi-synthetic datasets with dependent censoring to facilitate the evaluation of the metrics. Our experiments in synthetic and semi-synthetic data demonstrate that the proposed metrics can provide more accurate estimates of the model error than conventional metrics under dependent censoring.

Overcoming Dependent Censoring in the Evaluation of Survival Models

TL;DR

The paper tackles bias in survival-model evaluation caused by dependent censoring, where and are not independent. It introduces three copula-based metrics, CI-Dep, IBS-Dep, and MAE-Dep, that incorporate Archimedean copulas via the Copula-Graphic framework to account for the dependence between and , replacing KM-based components with CG-based estimates. A semi-synthetic data-generation framework is also developed to realistically simulate dependent censoring and enable robust benchmarking. Empirical results on synthetic and semi-synthetic data show that the proposed metrics reduce bias and provide more reliable model error estimates under dependent censoring, with performance depending on the strength of dependence and censoring rate.

Abstract

Conventional survival metrics, such as Harrell's concordance index (CI) and the Brier Score, rely on the independent censoring assumption for valid inference with right-censored data. However, in the presence of so-called dependent censoring, where the probability of censoring is related to the event of interest, these metrics can give biased estimates of the underlying model error. In this paper, we introduce three new evaluation metrics for survival analysis based on Archimedean copulas that can account for dependent censoring. We also develop a framework to generate realistic, semi-synthetic datasets with dependent censoring to facilitate the evaluation of the metrics. Our experiments in synthetic and semi-synthetic data demonstrate that the proposed metrics can provide more accurate estimates of the model error than conventional metrics under dependent censoring.

Paper Structure

This paper contains 32 sections, 1 theorem, 47 equations, 14 figures, 8 tables, 1 algorithm.

Key Result

Lemma 1

Under Sklar's (Survival) Theorem, and given $C(u_1, u_2)$ is a Archimedean copula, then if then we have

Figures (14)

  • Figure 1: (a) Residual dependence between event time $T$ (cancer relapse) and censoring time $C$ (cancer death) denoted by the diamond "Dep", due to an unobserved confounding covariate -- here "Tumor grade". (b) Evaluation under dependent censoring, where Patient A is censored immediately before the event for reasons related to the event. This leads to a biased estimation of the marginal survival distribution using the Kaplan-Meier (KM) kaplan_nonparametric_1958 estimator. Note that the two curves coincide between the range $[0, c_{i}]$. (c) Scatter plot generated from a standard exponential distribution under an Archimedean Clayton clayton1978model copula.
  • Figure 2: Plot of mean biases ($\pm$ 2 SD.) for baseline and proposed metrics as a function of the dependency under a known copula, averaged over 100 experiments ($N=10,000$, $d=10$).
  • Figure 3: Plot of mean biases ($\pm$ 2 SD.) for baseline and proposed metrics as a function of the censoring rate under a known copula, averaged over 100 experiments (Kendall's $\tau$ = 0.5, $N=10,000$, $d=10$) and two copula models (Clayton and Frank).
  • Figure 4: Evaluation pipeline. (1) Based on some raw, survival dataset $\mathcal{D}$, (2) we form $\tilde{\mathcal{D}}$ by flipping the event bit of $\mathcal{D}$ and then estimate the feature-dependent censoring distribution, $G_{\text{CoxPH}(\tilde{\mathcal{D}})}$. (3) We then select a subset of features from $\mathcal{D}$ using some strategy ( e.g., Top-K features), (4) and generate a ground-truth dataset, $\mathcal{D}'$, by removing the censored instances from $\mathcal{D}$. (5) Putting all this together, we have a semi-synthetic dataset $\mathcal{D}"$, which is based on the ground-truth dataset, the learned censoring distribution, and the selected features. (6) We then train two Archimedean copula models to estimate $\hat{C}_\theta$ and (7) five survival learners to make predictions based on $\mathcal{D}"$. Finally, (8) we compare the baseline and proposed censored metrics (using $\hat{C}_\theta$) with the true evaluation metric.
  • Figure 5: Plot of mean biases ($\pm$ 2 SD.) as a function of the dependency, where only the copula family is misspecified, averaged over 100 experiments ($N=10,000$, $d=10$).
  • ...and 9 more figures

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Lemma 1