Table of Contents
Fetching ...

Deep Copula-Based Survival Analysis for Dependent Censoring with Identifiability Guarantees

Weijia Zhang, Chun Kai Ling, Xuanhui Zhang

TL;DR

This work tackles dependent censoring in survival analysis by learning the full copula between event and censoring times without prespecifying a ground-truth copula. It introduces DCSurvival, a deep, end-to-end framework that jointly models Archimedean copulas via a neural generator $\varphi_{nn}$ and survivor/censor marginals (parametric or monotone neural density estimators), enabling gradient-based optimization of the dependent-likelihood. The authors establish identifiability under mild conditions (notably for copulas represented by $\varphi_{nn}$ and common marginal families) and demonstrate empirically that the method recovers dependency structure and significantly reduces survival-estimation bias across synthetic, semi-synthetic, and real-world datasets. The approach yields better calibration and robustness to the choice of copula, offering a practical, data-driven solution for survival analysis under dependent censoring with potential broad impact in healthcare analytics.

Abstract

Censoring is the central problem in survival analysis where either the time-to-event (for instance, death), or the time-tocensoring (such as loss of follow-up) is observed for each sample. The majority of existing machine learning-based survival analysis methods assume that survival is conditionally independent of censoring given a set of covariates; an assumption that cannot be verified since only marginal distributions is available from the data. The existence of dependent censoring, along with the inherent bias in current estimators has been demonstrated in a variety of applications, accentuating the need for a more nuanced approach. However, existing methods that adjust for dependent censoring require practitioners to specify the ground truth copula. This requirement poses a significant challenge for practical applications, as model misspecification can lead to substantial bias. In this work, we propose a flexible deep learning-based survival analysis method that simultaneously accommodate for dependent censoring and eliminates the requirement for specifying the ground truth copula. We theoretically prove the identifiability of our model under a broad family of copulas and survival distributions. Experiments results from a wide range of datasets demonstrate that our approach successfully discerns the underlying dependency structure and significantly reduces survival estimation bias when compared to existing methods.

Deep Copula-Based Survival Analysis for Dependent Censoring with Identifiability Guarantees

TL;DR

This work tackles dependent censoring in survival analysis by learning the full copula between event and censoring times without prespecifying a ground-truth copula. It introduces DCSurvival, a deep, end-to-end framework that jointly models Archimedean copulas via a neural generator and survivor/censor marginals (parametric or monotone neural density estimators), enabling gradient-based optimization of the dependent-likelihood. The authors establish identifiability under mild conditions (notably for copulas represented by and common marginal families) and demonstrate empirically that the method recovers dependency structure and significantly reduces survival-estimation bias across synthetic, semi-synthetic, and real-world datasets. The approach yields better calibration and robustness to the choice of copula, offering a practical, data-driven solution for survival analysis under dependent censoring with potential broad impact in healthcare analytics.

Abstract

Censoring is the central problem in survival analysis where either the time-to-event (for instance, death), or the time-tocensoring (such as loss of follow-up) is observed for each sample. The majority of existing machine learning-based survival analysis methods assume that survival is conditionally independent of censoring given a set of covariates; an assumption that cannot be verified since only marginal distributions is available from the data. The existence of dependent censoring, along with the inherent bias in current estimators has been demonstrated in a variety of applications, accentuating the need for a more nuanced approach. However, existing methods that adjust for dependent censoring require practitioners to specify the ground truth copula. This requirement poses a significant challenge for practical applications, as model misspecification can lead to substantial bias. In this work, we propose a flexible deep learning-based survival analysis method that simultaneously accommodate for dependent censoring and eliminates the requirement for specifying the ground truth copula. We theoretically prove the identifiability of our model under a broad family of copulas and survival distributions. Experiments results from a wide range of datasets demonstrate that our approach successfully discerns the underlying dependency structure and significantly reduces survival estimation bias when compared to existing methods.
Paper Structure (28 sections, 8 theorems, 19 equations, 7 figures, 2 tables, 3 algorithms)

This paper contains 28 sections, 8 theorems, 19 equations, 7 figures, 2 tables, 3 algorithms.

Key Result

Theorem 1

Sklar1959. Let F be a distribution function with margins $F_1,\cdots,F_d$. There exists a $d$-dimensional copula $\mathcal{C}$ such that for any $(x_1,\cdots, x_d) \in \mathcal{R}^d$ we have $F(x_1,\cdots,x_d)=\mathcal{C}(F(x_1),\cdots,F(x_d))$. Furthermore, if the marginals $F_1,\cdots,F_d$ are con

Figures (7)

  • Figure 1: Illustrations of (a) dependent censoring in survival analysis and (b) unobserved confounding in causal inference. Solid/dashed nodes denote observed/hidden variables, respectively. The dashed lines between survival/censoring time and treatment/outcome indicate that estimation and evaluation in survival analysis with dependent censoring face similar challenges as in causal inference.
  • Figure 2: Ground Truth
  • Figure 3: Learned Copula
  • Figure 5: Top to bottom (row): survival prediction biases of compared algorithms on varying censoring dependency governed by Frank, Clayton, and Gumbel copulas. Left to right: Linear-Risk and Nonlinear-Risk results. The lines represent the Survival-$l_1$ means and the shaded areas are the standard deviations. Best viewed in colour.
  • Figure 6: Calibration plots of on test samples of the SEER (left) and GBSG2 (right) datasets. The plots of better-calibrated algorithms are closer to perfect calibration (dashed black line). Best viewed in colour.
  • ...and 2 more figures

Theorems & Definitions (14)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • proof : Proof Sketch
  • Theorem 4
  • Lemma 5
  • Theorem 6
  • Definition 1
  • Definition 2
  • Definition 3
  • ...and 4 more