On the unification of zero-adjusted cure survival models
Francisco Louzada, Pedro Luiz Ramos, Hayala C. C. Souza, Lawal Oyeneyin, Gleici da Silva Castro Perdona
TL;DR
The paper addresses survival data with zero lifetimes and a cured fraction by introducing a unified zero-adjusted cure (ZAC) survival model that allows the latent competing causes count $N$ to follow flexible distributions (e.g., Negative Binomial). The approach defines $Y=\min(W_1,\dots,W_N)$ with $W_i=0$ when the event time is zero and derives the population survival $S_{zp}(y)=A(S(y))- \sum_{n=1}^{\infty} b_n (S(y))^n$, enabling simultaneous handling of zero-adjustment and cure; it also develops NB-based formulations and maximum likelihood estimation under censoring, with inference via asymptotic normality. The paper introduces new models (ZAC-NB and ZAC-Geo), demonstrates finite-sample performance through extensive simulations, and applies the framework to Sub-Saharan obstetric data, showing log-normal baselines and ZAC-Geo often provide the best fit by AIC, with interpretable estimates of zero-adjustment and cure proportions. The results offer a versatile, interpretable toolkit for adjusting survival analyses in settings where zeros and long-term survivors occur, with potential extensions to covariate inclusion and other domains.
Abstract
This paper proposes a unified version of survival models that accounts for both zero-adjustment and cure proportions in various latent competing causes, useful in data where survival times may be zero or cure proportions are present. These models are particularly relevant in scenarios like childbirth duration in sub-Saharan Africa. Different competing cause distributions were considered, including Binomial, Geometric, Poisson, and Negative Binomial. The model's maximum likelihood point estimators and asymptotic confidence intervals were evaluated through simulation, demonstrating improved accuracy with larger sample sizes. The model best fits real obstetric data when assuming geometrically distributed causes. This flexible model, capable of considering different distributions for the lifetime of susceptible individuals and competing causes, is an effective tool for adjusting survival data, indicating broad application potential.
