Benchmarking multi-step methods for the dynamic prediction of survival with numerous longitudinal predictors
Mirko Signorelli, Sophie Retif
TL;DR
The study benchmarks four recent multi-step dynamic prediction methods (MFPCCox, PRC, FunRSF, DynForest) against Static Cox and landmarking across three real-world datasets (ADNI, ROSMAP, PBC2) to predict time-to-event outcomes using numerous longitudinal covariates. By applying strict landmarking and evaluating with time-dependent C-index, tdAUC, and Brier scores, the work reveals that LMM-based approaches (PRC and DynForest) generally outperform MFPCA-based approaches (MFPCCox, FunRSF), with PRC often achieving the best predictive performance. Results also show substantial variability in method performance across datasets and highlight the trade-offs between predictive accuracy and computing time, with MFPCCox being fastest among multi-step methods and DynForest being the slowest. The findings have practical implications for choosing dynamic prediction tools in settings with many longitudinal covariates and varying follow-up patterns, emphasizing careful consideration of dataset characteristics and software practicality when deploying RPMs."
Abstract
In recent years, the growing availability of biomedical datasets featuring numerous longitudinal covariates has motivated the development of several multi-step methods for the dynamic prediction of survival outcomes. These methods employ either mixed-effects models or multivariate functional principal component analysis to model and summarize the longitudinal covariates' evolution over time. Then, they use Cox models or random survival forests to predict survival probabilities, using as covariates both baseline variables and the summaries of the longitudinal variables obtained in the previous modelling step. Because these multi-step methods are still quite new, to date little is known about their applicability, limitations, and predictive performance when applied to real-world data. To gain a better understanding of these aspects, we performed a benchmarking of these multi-step methods (and two simpler prediction approaches) using three datasets that differ in sample size, number of longitudinal covariates and length of follow-up. We discuss the different modelling choices made by these methods, and some adjustments that one may need to do in order to be able to apply them to real-world data. Furthermore, we compare their predictive performance using multiple performance measures and landmark times, assess their computing time, and discuss their strengths and limitations.
