Non-parametric cure models through extreme-value tail estimation
Jan Beirlant, Martin Bladt, Ingrid Van Keilegom
TL;DR
This paper tackles the challenge of estimating the cure rate in censored survival data when follow-up may be insufficient to distinguish immunity from late events. It develops a non-parametric cure model that fuses extreme-value theory with probability plotting to jointly estimate the cure probability $p$ and the tail index of susceptible times, using all top order statistics and a Peaks-over-Threshold framework. The methodology covers Gumbel and Fréchet domains and extends to Pareto-type, log-normal, and Weibull tails, with a regularization mechanism to stabilize estimation. The authors provide asymptotic theory under insufficient follow-up, comprehensive finite-sample simulations, and a real-data application to the Norwegian birth registry, showing competitive performance and practical utility for tail characterization and cure-rate inference.
Abstract
In survival analysis, the estimation of the proportion of subjects who will never experience the event of interest, termed the cure rate, has received considerable attention recently. Its estimation can be a particularly difficult task when follow-up is not sufficient, that is when the censoring mechanism has a smaller support than the distribution of the target data. In the latter case, non-parametric estimators were recently proposed using extreme value methodology, assuming that the distribution of the susceptible population is in the Fréchet or Gumbel max-domains of attraction. In this paper, we take the extreme value techniques one step further, to jointly estimate the cure rate and the extreme value index, using probability plotting methodology, and in particular using the full information contained in the top order statistics. In other words, under sufficient or insufficient follow-up, we reconstruct the immune proportion. To this end, a Peaks-over-Threshold approach is proposed under the Gumbel max-domain assumption. Next, the approach is also transferred to more specific models such as Pareto, log-normal and Weibull tail models, allowing to recognize the most important tail characteristics of the susceptible population. We establish the asymptotic behavior of our estimators under regularization. Though simulation studies, our estimators are show to rival and often outperform established models, even when purely considering cure rate estimation. Finally, we provide an application of our method to Norwegian birth registry data.
