Table of Contents
Fetching ...

The super learner for time-to-event outcomes: A tutorial

Ruth H. Keogh, Karla Diaz-Ordaz, Nan van Geloven, Jon Michael Gran, Kamaryn T. Tanner

TL;DR

The paper tackles predicting time-to-event risks under right censoring by presenting a practical tutorial on the super learner (SL) framework for time-to-event outcomes. It surveys three concrete implementations: a discrete-time SL, a continuous-time SL following Westling et al. (2023), and the joint survival SL by Munch & Gerds (2025), detailing step-by-step procedures, loss functions, and how censoring is handled. It provides guidance on R implementations, illustrates the methods with the Rotterdam breast cancer data, and discusses trade-offs between approaches, including when to use ensemble versus single learners. The tutorial emphasizes the oracle property of SL, demonstrates improved predictive performance with continuous-time methods and richer learner libraries, and connects SL usage to causal inference contexts such as TMLE, making it a practical resource for predictive modeling in survival analysis.

Abstract

Estimating risks or survival probabilities conditional on individual characteristics based on censored time-to-event data is a commonly faced task. This may be for the purpose of developing a prediction model or may be part of a wider estimation procedure, such as in causal inference. A challenge is that it is impossible to know at the outset which of a set of candidate models will provide the best risk estimates. The super learner is a powerful approach for finding the best model or combination of models ('ensemble') among a pre-specified set of candidate models or 'learners', which can include both 'statistical' models (e.g. parametric, semi-parametric models) and 'machine learning' models. Super learners for time-to-event outcomes have been developed, but the literature is technical and the full details of how these methods work and can be implemented in practice have not previously been presented in an accessible format. In this paper we provide a practical tutorial on super learner methods for time-to-event outcomes. An overview of the general steps involved in the super learner is given, followed by details of three specific implementations for time-to-event outcomes. These include the originally proposed super learner, which involves using a discrete time scale, and two more recently proposed versions of the super learner for continuous-time data. We compare the properties of the methods and provide information on how they can be implemented in R. The methods are illustrated using an open access data set and R code is provided.

The super learner for time-to-event outcomes: A tutorial

TL;DR

The paper tackles predicting time-to-event risks under right censoring by presenting a practical tutorial on the super learner (SL) framework for time-to-event outcomes. It surveys three concrete implementations: a discrete-time SL, a continuous-time SL following Westling et al. (2023), and the joint survival SL by Munch & Gerds (2025), detailing step-by-step procedures, loss functions, and how censoring is handled. It provides guidance on R implementations, illustrates the methods with the Rotterdam breast cancer data, and discusses trade-offs between approaches, including when to use ensemble versus single learners. The tutorial emphasizes the oracle property of SL, demonstrates improved predictive performance with continuous-time methods and richer learner libraries, and connects SL usage to causal inference contexts such as TMLE, making it a practical resource for predictive modeling in survival analysis.

Abstract

Estimating risks or survival probabilities conditional on individual characteristics based on censored time-to-event data is a commonly faced task. This may be for the purpose of developing a prediction model or may be part of a wider estimation procedure, such as in causal inference. A challenge is that it is impossible to know at the outset which of a set of candidate models will provide the best risk estimates. The super learner is a powerful approach for finding the best model or combination of models ('ensemble') among a pre-specified set of candidate models or 'learners', which can include both 'statistical' models (e.g. parametric, semi-parametric models) and 'machine learning' models. Super learners for time-to-event outcomes have been developed, but the literature is technical and the full details of how these methods work and can be implemented in practice have not previously been presented in an accessible format. In this paper we provide a practical tutorial on super learner methods for time-to-event outcomes. An overview of the general steps involved in the super learner is given, followed by details of three specific implementations for time-to-event outcomes. These include the originally proposed super learner, which involves using a discrete time scale, and two more recently proposed versions of the super learner for continuous-time data. We compare the properties of the methods and provide information on how they can be implemented in R. The methods are illustrated using an open access data set and R code is provided.

Paper Structure

This paper contains 30 sections, 21 equations, 6 tables.