Neural interval-censored survival regression with feature selection
Carlos García Meixide, Marcos Matabuena, Louis Abraham, Michael R. Kosorok
TL;DR
The paper tackles interval-censored survival regression in high-dimensional settings by proposing a neural-network–based Accelerated Failure Time (AFT) framework. It combines a Lassonet-inspired sparse neural network for variable selection with a residual-network predictor for the interval-censored log-time, modeled as $\\log T = r_{\\theta,W}(Z) + \varepsilon$, where $\\varepsilon \sim N(0,\\sigma^2)$. Key contributions include stability-selection based variable selection for interval censoring, a three-stage sample-splitting protocol for honest evaluation, and real-data demonstrations on Type 1 Diabetes Exchange and NHANES, along with open-source software. The results show improved predictive accuracy in nonlinear settings over traditional LogNormal AFT models, along with interpretable variable selections and practical guidance for researchers facing interval-censored, high-dimensional biomedical data.
Abstract
Survival analysis is a fundamental area of focus in biomedical research, particularly in the context of personalized medicine. This prominence is due to the increasing prevalence of large and high-dimensional datasets, such as omics and medical image data. However, the literature on non-linear regression algorithms and variable selection techniques for interval-censoring is either limited or non-existent, particularly in the context of neural networks. Our objective is to introduce a novel predictive framework tailored for interval-censored regression tasks, rooted in Accelerated Failure Time (AFT) models. Our strategy comprises two key components: i) a variable selection phase leveraging recent advances on sparse neural network architectures, ii) a regression model targeting prediction of the interval-censored response. To assess the performance of our novel algorithm, we conducted a comprehensive evaluation through both numerical experiments and real-world applications that encompass scenarios related to diabetes and physical activity. Our results outperform traditional AFT algorithms, particularly in scenarios featuring non-linear relationships.
