Table of Contents
Fetching ...

Non-parametric cure models through extreme-value tail estimation

Jan Beirlant, Martin Bladt, Ingrid Van Keilegom

TL;DR

This paper tackles the challenge of estimating the cure rate in censored survival data when follow-up may be insufficient to distinguish immunity from late events. It develops a non-parametric cure model that fuses extreme-value theory with probability plotting to jointly estimate the cure probability $p$ and the tail index of susceptible times, using all top order statistics and a Peaks-over-Threshold framework. The methodology covers Gumbel and Fréchet domains and extends to Pareto-type, log-normal, and Weibull tails, with a regularization mechanism to stabilize estimation. The authors provide asymptotic theory under insufficient follow-up, comprehensive finite-sample simulations, and a real-data application to the Norwegian birth registry, showing competitive performance and practical utility for tail characterization and cure-rate inference.

Abstract

In survival analysis, the estimation of the proportion of subjects who will never experience the event of interest, termed the cure rate, has received considerable attention recently. Its estimation can be a particularly difficult task when follow-up is not sufficient, that is when the censoring mechanism has a smaller support than the distribution of the target data. In the latter case, non-parametric estimators were recently proposed using extreme value methodology, assuming that the distribution of the susceptible population is in the Fréchet or Gumbel max-domains of attraction. In this paper, we take the extreme value techniques one step further, to jointly estimate the cure rate and the extreme value index, using probability plotting methodology, and in particular using the full information contained in the top order statistics. In other words, under sufficient or insufficient follow-up, we reconstruct the immune proportion. To this end, a Peaks-over-Threshold approach is proposed under the Gumbel max-domain assumption. Next, the approach is also transferred to more specific models such as Pareto, log-normal and Weibull tail models, allowing to recognize the most important tail characteristics of the susceptible population. We establish the asymptotic behavior of our estimators under regularization. Though simulation studies, our estimators are show to rival and often outperform established models, even when purely considering cure rate estimation. Finally, we provide an application of our method to Norwegian birth registry data.

Non-parametric cure models through extreme-value tail estimation

TL;DR

This paper tackles the challenge of estimating the cure rate in censored survival data when follow-up may be insufficient to distinguish immunity from late events. It develops a non-parametric cure model that fuses extreme-value theory with probability plotting to jointly estimate the cure probability and the tail index of susceptible times, using all top order statistics and a Peaks-over-Threshold framework. The methodology covers Gumbel and Fréchet domains and extends to Pareto-type, log-normal, and Weibull tails, with a regularization mechanism to stabilize estimation. The authors provide asymptotic theory under insufficient follow-up, comprehensive finite-sample simulations, and a real-data application to the Norwegian birth registry, showing competitive performance and practical utility for tail characterization and cure-rate inference.

Abstract

In survival analysis, the estimation of the proportion of subjects who will never experience the event of interest, termed the cure rate, has received considerable attention recently. Its estimation can be a particularly difficult task when follow-up is not sufficient, that is when the censoring mechanism has a smaller support than the distribution of the target data. In the latter case, non-parametric estimators were recently proposed using extreme value methodology, assuming that the distribution of the susceptible population is in the Fréchet or Gumbel max-domains of attraction. In this paper, we take the extreme value techniques one step further, to jointly estimate the cure rate and the extreme value index, using probability plotting methodology, and in particular using the full information contained in the top order statistics. In other words, under sufficient or insufficient follow-up, we reconstruct the immune proportion. To this end, a Peaks-over-Threshold approach is proposed under the Gumbel max-domain assumption. Next, the approach is also transferred to more specific models such as Pareto, log-normal and Weibull tail models, allowing to recognize the most important tail characteristics of the susceptible population. We establish the asymptotic behavior of our estimators under regularization. Though simulation studies, our estimators are show to rival and often outperform established models, even when purely considering cure rate estimation. Finally, we provide an application of our method to Norwegian birth registry data.

Paper Structure

This paper contains 13 sections, 1 theorem, 65 equations, 8 figures.

Key Result

Theorem 1

Assume that $\lambda=\lambda_{k,n}= C_{\lambda}\left( {k \over n}\right)^{-2\gamma_c}$ for some $C_{\lambda} >0$, and $k \,n^{\gamma_c/(1-\gamma_c)} \to \infty$, then we have the following asymptotic distributional identity Furthermore, $(n/k)^{-\gamma_c} T_{k,n} \stackrel{d}{=} (n/k)^{-\gamma_c/2}k^{-1/2}(1+o(1)) N(0,B_v (1-p_0(\tau_c))^{2} \sigma^2_k)$ with and $n^{-1/2}\mathbf{Z}(U_H(n))\sta

Figures (8)

  • Figure 1: Simulation results for scenarios 1--5 (from top to bottom).
  • Figure 2: Simulation results for scenarios 6--10 (from top to bottom).
  • Figure 3: Norwegian second borns. The $p_n$, $\hat{p}_k^G$ and $\hat{p}_k^F$ estimates, next to the Gumbel and Fréchet goodness-of-fit plots.
  • Figure 4: Norwegian second borns. The $p_n$, $\hat{p}_k^P$, $\hat{p}_k^W$ and $\hat{p}_k^L$ estimates, next to the Pareto, Weibull and lognormal goodness-of-fit plots.
  • Figure 5: Simulated data following the Pareto, Weibull and lognormal models, best fitting to the Norwegian data. Here $k/n=0.9$
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1: Asymptotic representation
  • Remark 2