Interpretable Deep Regression Models with Interval-Censored Failure Time Data
Changhui Yuan, Shishun Zhao, Shuwei Li, Xinyuan Song, Zhao Chen
TL;DR
The paper develops a general, interpretable deep regression framework for interval-censored failure time data within a partially linear transformation model, combining parametric effects of $\boldsymbol{X}$ with a DNN-driven nonparametric component for $\phi(\boldsymbol{W})$ and a monotone-spline sieve for the baseline hazard $\Lambda(t)$. Estimation uses an EM algorithm with SGD, enabling tractable inference while preserving monotonicity and sparsity for stability; rigorous asymptotic theory establishes consistency, rates, and semiparametric efficiency for the parametric component, with minimax convergence guarantees for the DNN-based part. Simulations show accurate estimation and superior predictive performance across complex, high-dimensional covariates and nonlinear effects, and the ADNI application demonstrates clinically meaningful risk factor insights and competitive predictive accuracy. The work advances interval-censored survival analysis by integrating flexible deep learning with interpretable, partially linear structure, paving the way for broader applications and extensions in multivariate and more intricate censoring settings.
Abstract
Deep neural networks (DNNs) have become powerful tools for modeling complex data structures through sequentially integrating simple functions in each hidden layer. In survival analysis, recent advances of DNNs primarily focus on enhancing model capabilities, especially in exploring nonlinear covariate effects under right censoring. However, deep learning methods for interval-censored data, where the unobservable failure time is only known to lie in an interval, remain underexplored and limited to specific data type or model. This work proposes a general regression framework for interval-censored data with a broad class of partially linear transformation models, where key covariate effects are modeled parametrically while nonlinear effects of nuisance multi-modal covariates are approximated via DNNs, balancing interpretability and flexibility. We employ sieve maximum likelihood estimation by leveraging monotone splines to approximate the cumulative baseline hazard function. To ensure reliable and tractable estimation, we develop an EM algorithm incorporating stochastic gradient descent. We establish the asymptotic properties of parameter estimators and show that the DNN estimator achieves minimax-optimal convergence. Extensive simulations demonstrate superior estimation and prediction accuracy over state-of-the-art methods. Applying our method to the Alzheimer's Disease Neuroimaging Initiative dataset yields novel insights and improved predictive performance compared to traditional approaches.
