Table of Contents
Fetching ...

Ensemble Machine Learning and Statistical Procedures for Dynamic Predictions of Time-to-Event Outcomes

Nina van Gerwen, Sten Willemsen, Bettina E. Hansen, Christophe Corpechot, Marco Carbone, Cynthia Levy, Maria-Carlota Londõno, Atsushi Tanaka, Palak Trivedi, Alejandra Villamil, Gideon Hirschfield, Dimitris Rizopoulos

TL;DR

This work extends the Super Learner framework to combine dynamic predictions from different models and procedures and pays special attention to appropriate objective functions for Super Learner to obtain the most optimal weighted combination of dynamic predictions.

Abstract

Dynamic predictions for longitudinal and time-to-event outcomes have become a versatile tool in precision medicine. Our work is motivated by the application of dynamic predictions in the decision-making process for primary biliary cholangitis patients. For these patients, serial biomarker measurements (e.g., bilirubin and alkaline phosphatase levels) are routinely collected to inform treating physicians of the risk of liver failure and guide clinical decision-making. Two popular statistical approaches to derive dynamic predictions are joint modelling and landmarking. However, recently, machine learning techniques have also been proposed. Each approach has its merits, and no single method exists to outperform all others. Consequently, obtaining the best possible survival estimates is challenging. Therefore, we extend the Super Learner framework to combine dynamic predictions from different models and procedures. Super Learner is an ensemble learning technique that allows users to combine different prediction algorithms to improve predictive accuracy and flexibility. It uses cross-validation and different objective functions of performance (e.g., squared loss) that suit specific applications to build the optimally weighted combination of predictions from a library of candidate algorithms. In our work, we pay special attention to appropriate objective functions for Super Learner to obtain the most optimal weighted combination of dynamic predictions. In our primary biliary cholangitis application, Super Learner presented unique benefits due to its ability to flexibly combine outputs from a diverse set of models with varying assumptions for equal or better predictive performance than any model fit separately.

Ensemble Machine Learning and Statistical Procedures for Dynamic Predictions of Time-to-Event Outcomes

TL;DR

This work extends the Super Learner framework to combine dynamic predictions from different models and procedures and pays special attention to appropriate objective functions for Super Learner to obtain the most optimal weighted combination of dynamic predictions.

Abstract

Dynamic predictions for longitudinal and time-to-event outcomes have become a versatile tool in precision medicine. Our work is motivated by the application of dynamic predictions in the decision-making process for primary biliary cholangitis patients. For these patients, serial biomarker measurements (e.g., bilirubin and alkaline phosphatase levels) are routinely collected to inform treating physicians of the risk of liver failure and guide clinical decision-making. Two popular statistical approaches to derive dynamic predictions are joint modelling and landmarking. However, recently, machine learning techniques have also been proposed. Each approach has its merits, and no single method exists to outperform all others. Consequently, obtaining the best possible survival estimates is challenging. Therefore, we extend the Super Learner framework to combine dynamic predictions from different models and procedures. Super Learner is an ensemble learning technique that allows users to combine different prediction algorithms to improve predictive accuracy and flexibility. It uses cross-validation and different objective functions of performance (e.g., squared loss) that suit specific applications to build the optimally weighted combination of predictions from a library of candidate algorithms. In our work, we pay special attention to appropriate objective functions for Super Learner to obtain the most optimal weighted combination of dynamic predictions. In our primary biliary cholangitis application, Super Learner presented unique benefits due to its ability to flexibly combine outputs from a diverse set of models with varying assumptions for equal or better predictive performance than any model fit separately.
Paper Structure (9 sections, 20 equations, 2 figures, 4 tables, 2 algorithms)

This paper contains 9 sections, 20 equations, 2 figures, 4 tables, 2 algorithms.

Figures (2)

  • Figure 1: Metric estimates of the ensemble SL (eSL) and discrete SL (dSL) for predicting the composite endpoint in the Global PBC Data. Models $\mathcal{M}_1$ to $\mathcal{M}_9$ are denoted by $1$ to $9$. The blue dashed line serves as reference point for the performance of the eSL. The textboxes in each graph denote the performance of a Kaplan-Meier (KM) model that contained no covariates.
  • Figure 2: Metric estimates of the ensemble SL (eSL) and discrete SL (dSL) and oracle model (OM) under no censoring (Scenario 1), random censoring (Scenario 2) and informative censoring (Scenario 3) over 100 simulated datasets. The box plots under Train Error denote the metric values in the dataset the model was trained on, the box plots under Test Error show the metric values in the holdout data set.