Table of Contents
Fetching ...

Benchmarking multi-step methods for the dynamic prediction of survival with numerous longitudinal predictors

Mirko Signorelli, Sophie Retif

TL;DR

The study benchmarks four recent multi-step dynamic prediction methods (MFPCCox, PRC, FunRSF, DynForest) against Static Cox and landmarking across three real-world datasets (ADNI, ROSMAP, PBC2) to predict time-to-event outcomes using numerous longitudinal covariates. By applying strict landmarking and evaluating with time-dependent C-index, tdAUC, and Brier scores, the work reveals that LMM-based approaches (PRC and DynForest) generally outperform MFPCA-based approaches (MFPCCox, FunRSF), with PRC often achieving the best predictive performance. Results also show substantial variability in method performance across datasets and highlight the trade-offs between predictive accuracy and computing time, with MFPCCox being fastest among multi-step methods and DynForest being the slowest. The findings have practical implications for choosing dynamic prediction tools in settings with many longitudinal covariates and varying follow-up patterns, emphasizing careful consideration of dataset characteristics and software practicality when deploying RPMs."

Abstract

In recent years, the growing availability of biomedical datasets featuring numerous longitudinal covariates has motivated the development of several multi-step methods for the dynamic prediction of survival outcomes. These methods employ either mixed-effects models or multivariate functional principal component analysis to model and summarize the longitudinal covariates' evolution over time. Then, they use Cox models or random survival forests to predict survival probabilities, using as covariates both baseline variables and the summaries of the longitudinal variables obtained in the previous modelling step. Because these multi-step methods are still quite new, to date little is known about their applicability, limitations, and predictive performance when applied to real-world data. To gain a better understanding of these aspects, we performed a benchmarking of these multi-step methods (and two simpler prediction approaches) using three datasets that differ in sample size, number of longitudinal covariates and length of follow-up. We discuss the different modelling choices made by these methods, and some adjustments that one may need to do in order to be able to apply them to real-world data. Furthermore, we compare their predictive performance using multiple performance measures and landmark times, assess their computing time, and discuss their strengths and limitations.

Benchmarking multi-step methods for the dynamic prediction of survival with numerous longitudinal predictors

TL;DR

The study benchmarks four recent multi-step dynamic prediction methods (MFPCCox, PRC, FunRSF, DynForest) against Static Cox and landmarking across three real-world datasets (ADNI, ROSMAP, PBC2) to predict time-to-event outcomes using numerous longitudinal covariates. By applying strict landmarking and evaluating with time-dependent C-index, tdAUC, and Brier scores, the work reveals that LMM-based approaches (PRC and DynForest) generally outperform MFPCA-based approaches (MFPCCox, FunRSF), with PRC often achieving the best predictive performance. Results also show substantial variability in method performance across datasets and highlight the trade-offs between predictive accuracy and computing time, with MFPCCox being fastest among multi-step methods and DynForest being the slowest. The findings have practical implications for choosing dynamic prediction tools in settings with many longitudinal covariates and varying follow-up patterns, emphasizing careful consideration of dataset characteristics and software practicality when deploying RPMs."

Abstract

In recent years, the growing availability of biomedical datasets featuring numerous longitudinal covariates has motivated the development of several multi-step methods for the dynamic prediction of survival outcomes. These methods employ either mixed-effects models or multivariate functional principal component analysis to model and summarize the longitudinal covariates' evolution over time. Then, they use Cox models or random survival forests to predict survival probabilities, using as covariates both baseline variables and the summaries of the longitudinal variables obtained in the previous modelling step. Because these multi-step methods are still quite new, to date little is known about their applicability, limitations, and predictive performance when applied to real-world data. To gain a better understanding of these aspects, we performed a benchmarking of these multi-step methods (and two simpler prediction approaches) using three datasets that differ in sample size, number of longitudinal covariates and length of follow-up. We discuss the different modelling choices made by these methods, and some adjustments that one may need to do in order to be able to apply them to real-world data. Furthermore, we compare their predictive performance using multiple performance measures and landmark times, assess their computing time, and discuss their strengths and limitations.
Paper Structure (29 sections, 11 equations, 6 figures, 6 tables)

This paper contains 29 sections, 11 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Cross-validation estimates of the time-dependent AUC for the prediction of time to dementia in the ADNI dataset. The corresponding numeric values can be found in Supplementary Table 6.
  • Figure 2: Cross-validated Brier score estimates for the prediction of time to dementia in the ADNI dataset. The corresponding numeric values can be found in Supplementary Table 7.
  • Figure 3: Cross-validation estimates of the time-dependent AUC for the prediction of time to AD in the ROSMAP dataset. The corresponding numeric values can be found in Supplementary Table 8.
  • Figure 4: Cross-validated Brier score estimates for the prediction of time to AD in the ROSMAP dataset. The corresponding numeric values can be found in Supplementary Table 9.
  • Figure 5: Cross-validation estimates of the time-dependent AUC for the prediction of time to death in the PBC2 dataset. The corresponding numeric values can be found in Supplementary Table 10.
  • ...and 1 more figures