Deep Copula-Based Survival Analysis for Dependent Censoring with Identifiability Guarantees

Weijia Zhang; Chun Kai Ling; Xuanhui Zhang

Deep Copula-Based Survival Analysis for Dependent Censoring with Identifiability Guarantees

Weijia Zhang, Chun Kai Ling, Xuanhui Zhang

TL;DR

This work tackles dependent censoring in survival analysis by learning the full copula between event and censoring times without prespecifying a ground-truth copula. It introduces DCSurvival, a deep, end-to-end framework that jointly models Archimedean copulas via a neural generator $\varphi_{nn}$ and survivor/censor marginals (parametric or monotone neural density estimators), enabling gradient-based optimization of the dependent-likelihood. The authors establish identifiability under mild conditions (notably for copulas represented by $\varphi_{nn}$ and common marginal families) and demonstrate empirically that the method recovers dependency structure and significantly reduces survival-estimation bias across synthetic, semi-synthetic, and real-world datasets. The approach yields better calibration and robustness to the choice of copula, offering a practical, data-driven solution for survival analysis under dependent censoring with potential broad impact in healthcare analytics.

Abstract

Censoring is the central problem in survival analysis where either the time-to-event (for instance, death), or the time-tocensoring (such as loss of follow-up) is observed for each sample. The majority of existing machine learning-based survival analysis methods assume that survival is conditionally independent of censoring given a set of covariates; an assumption that cannot be verified since only marginal distributions is available from the data. The existence of dependent censoring, along with the inherent bias in current estimators has been demonstrated in a variety of applications, accentuating the need for a more nuanced approach. However, existing methods that adjust for dependent censoring require practitioners to specify the ground truth copula. This requirement poses a significant challenge for practical applications, as model misspecification can lead to substantial bias. In this work, we propose a flexible deep learning-based survival analysis method that simultaneously accommodate for dependent censoring and eliminates the requirement for specifying the ground truth copula. We theoretically prove the identifiability of our model under a broad family of copulas and survival distributions. Experiments results from a wide range of datasets demonstrate that our approach successfully discerns the underlying dependency structure and significantly reduces survival estimation bias when compared to existing methods.

Deep Copula-Based Survival Analysis for Dependent Censoring with Identifiability Guarantees

TL;DR

and survivor/censor marginals (parametric or monotone neural density estimators), enabling gradient-based optimization of the dependent-likelihood. The authors establish identifiability under mild conditions (notably for copulas represented by

and common marginal families) and demonstrate empirically that the method recovers dependency structure and significantly reduces survival-estimation bias across synthetic, semi-synthetic, and real-world datasets. The approach yields better calibration and robustness to the choice of copula, offering a practical, data-driven solution for survival analysis under dependent censoring with potential broad impact in healthcare analytics.

Abstract

Paper Structure (28 sections, 8 theorems, 19 equations, 7 figures, 2 tables, 3 algorithms)

This paper contains 28 sections, 8 theorems, 19 equations, 7 figures, 2 tables, 3 algorithms.

Introduction
Survival, Censoring and Copula
Independent Censoring.
Copulas.
Dependent Censoring via Copula.
End-to-end Survival Analysis via Copula
Archimedean Copula for Event and Censoring Times.
Survival and Censoring Marginals.
Model Identifiability
Experiments
Synthetic Datasets
Semi-Synthetic Datasets
Real-World Datasets
Identifying Censoring Dependency Structure
Reducing Survival Estimation Bias
...and 13 more sections

Key Result

Theorem 1

Sklar1959. Let F be a distribution function with margins $F_1,\cdots,F_d$. There exists a $d$-dimensional copula $\mathcal{C}$ such that for any $(x_1,\cdots, x_d) \in \mathcal{R}^d$ we have $F(x_1,\cdots,x_d)=\mathcal{C}(F(x_1),\cdots,F(x_d))$. Furthermore, if the marginals $F_1,\cdots,F_d$ are con

Figures (7)

Figure 1: Illustrations of (a) dependent censoring in survival analysis and (b) unobserved confounding in causal inference. Solid/dashed nodes denote observed/hidden variables, respectively. The dashed lines between survival/censoring time and treatment/outcome indicate that estimation and evaluation in survival analysis with dependent censoring face similar challenges as in causal inference.
Figure 2: Ground Truth
Figure 3: Learned Copula
Figure 5: Top to bottom (row): survival prediction biases of compared algorithms on varying censoring dependency governed by Frank, Clayton, and Gumbel copulas. Left to right: Linear-Risk and Nonlinear-Risk results. The lines represent the Survival-$l_1$ means and the shaded areas are the standard deviations. Best viewed in colour.
Figure 6: Calibration plots of on test samples of the SEER (left) and GBSG2 (right) datasets. The plots of better-calibrated algorithms are closer to perfect calibration (dashed black line). Best viewed in colour.
...and 2 more figures

Theorems & Definitions (14)

Theorem 1
Theorem 2
Theorem 3
proof : Proof Sketch
Theorem 4
Lemma 5
Theorem 6
Definition 1
Definition 2
Definition 3
...and 4 more

Deep Copula-Based Survival Analysis for Dependent Censoring with Identifiability Guarantees

TL;DR

Abstract

Deep Copula-Based Survival Analysis for Dependent Censoring with Identifiability Guarantees

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (14)