Table of Contents
Fetching ...

Non-parametric estimation of net survival under dependence between death causes

Oskar Laverny, Nathalie Grafféo, Roch Giorgi

TL;DR

This paper tackles relative survival analysis when the standard independence between excess deaths $E$ and competing mortality $P$ is questionable. It introduces a generalized non-parametric Pohar Perme estimator under a copula-based dependence $(\mathcal{H}_{\mathcal{C}})$, derives counting-process–based asymptotics, and provides variance estimation and a log-rank-type test for group differences. Through simulations, it demonstrates that misspecifying the dependence structure can bias excess survival estimates and corrupt inference, while a correctly specified copula yields reliable results; a colorectal cancer registry application illustrates the substantial impact of the dependence assumption on both estimates and uncertainty. The work highlights a practical pathway to assess and account for dependence in relative survival, while noting that plug-in estimators and copula specification remain key areas for further theoretical and empirical refinement.

Abstract

Relative survival methodology deals with a competing risks survival model where the cause of death is unknown. This lack of information occurs regularly in population-based cancer studies. Non-parametric estimation of the net survival is possible through the Pohar Perme estimator. Derived similarly to Kaplan-Meier, it nevertheless relies on an untestable independence assumption. We propose here to relax this assumption and provide a generalized non-parametric estimator that works for other dependence structures, by leveraging the underlying stochastic processes and martingales. We formally derive asymptotics of this estimator, providing variance estimation and log-rank-type tests. Our approach provides a new perspective on the Pohar Perme estimator and the acceptability of the underlying independence assumption. We highlight the impact of this dependence structure assumption on simulation studies, and illustrate them through an application on registry data relative to colorectal cancer, before discussing potential extensions of our methodology.

Non-parametric estimation of net survival under dependence between death causes

TL;DR

This paper tackles relative survival analysis when the standard independence between excess deaths and competing mortality is questionable. It introduces a generalized non-parametric Pohar Perme estimator under a copula-based dependence , derives counting-process–based asymptotics, and provides variance estimation and a log-rank-type test for group differences. Through simulations, it demonstrates that misspecifying the dependence structure can bias excess survival estimates and corrupt inference, while a correctly specified copula yields reliable results; a colorectal cancer registry application illustrates the substantial impact of the dependence assumption on both estimates and uncertainty. The work highlights a practical pathway to assess and account for dependence in relative survival, while noting that plug-in estimators and copula specification remain key areas for further theoretical and empirical refinement.

Abstract

Relative survival methodology deals with a competing risks survival model where the cause of death is unknown. This lack of information occurs regularly in population-based cancer studies. Non-parametric estimation of the net survival is possible through the Pohar Perme estimator. Derived similarly to Kaplan-Meier, it nevertheless relies on an untestable independence assumption. We propose here to relax this assumption and provide a generalized non-parametric estimator that works for other dependence structures, by leveraging the underlying stochastic processes and martingales. We formally derive asymptotics of this estimator, providing variance estimation and log-rank-type tests. Our approach provides a new perspective on the Pohar Perme estimator and the acceptability of the underlying independence assumption. We highlight the impact of this dependence structure assumption on simulation studies, and illustrate them through an application on registry data relative to colorectal cancer, before discussing potential extensions of our methodology.

Paper Structure

This paper contains 17 sections, 1 theorem, 68 equations, 9 figures, 11 tables.

Key Result

Theorem 1

For every $d$-variate random vector $\bm X$ with joint distribution function $F$ and marginal distributions functions $F_i$'s, there exists a copula $C$ such that The copula $C$ is uniquely determined on $\mathrm{Ran}(F_{1}) \times ... \times \mathrm{Ran}(F_{d})$, where $\mathrm{Ran}(F_i)$ denotes the range of the function $F_i$. In particular, if all marginals are absolutely continuous, $C$ is u

Figures (9)

  • Figure 1: $5000$ pairs of ranks sampled (in the unit square) from different Archimedean copulas. We use these copulas as survival copulas of the vector $(E,P)$: the lower-left corner of each plot therefore shows the density of very large times $E$ and $P$, while the upper-right corner represents the density of very small times $E$ and $P$. Lower-right and upper-left corners respectively represent density of cases where $E$ is small while $P$ is large and vice versa.
  • Figure 2: Each graph shows (in blue) the $N=1000$ survival curves $t \to \widehat{S}_E^{(1)}(t),...,t \to \widehat{S}_E^{(N)}(t)$ estimated under $\left(\mathcal{H}_{\mathcal{C}}\right)$ on resamples simulated using $\mathcal{C}_0$. Abscissa represents time $t$ in years. The true survival curve $S_E(t)$ of $E \sim\texttt{Exponential}(\mu=10)$ is depicted in orange as a target reference. The true dependence structure $\mathcal{C}_0$ varies with the line, while the dependence structure assumed by the estimators $\mathcal{C}$ varies across columns: plots on the diagonal thus correspond to well-specified cases $\mathcal{C} = \mathcal{C}_0$.
  • Figure 3: Each graph shows the $N=1000$ ratios $t \to \frac{\widehat{S}_E^{(k)}(t)}{S_E(t)},\; k \in 1,...,N$ between the estimated survival curves and the true one. Abscissa represents time $t$ in years. Lines represent the true copula $\mathcal{C}_0$ and columns the hypothesized one $\mathcal{C}$: diagonal plots thus correspond to well-specified cases $\mathcal{C} = \mathcal{C}_0$. A flat average line close to 1 denotes a well-performing estimator.
  • Figure 4: (Scenario 1) Each histogram represents the distribution of the $N=1000$ p-values under $H_0$. Lines represent the true copula $\mathcal{C}_0$ and columns the hypothesized one $\mathcal{C}$. A good results is a uniform distribution.
  • Figure 5: (Scenario 2) Each histogram represents the distribution of the $N=1000$ p-values under $H_0$. Lines represent the true copula $\mathcal{C}_0$ and columns the hypothesized one $\mathcal{C}$. A good results is a uniform distribution.
  • ...and 4 more figures

Theorems & Definitions (3)

  • Definition 1: Generalized Pohar Perme estimator
  • Definition 2: Observable test statistic
  • Theorem 1: Sklar's Theoremsklar1959