Table of Contents
Fetching ...

Experimental Comparison of Ensemble Methods and Time-to-Event Analysis Models Through Integrated Brier Score and Concordance Index

Camila Fernandez, Chung Shue Chen, Chen Pierre Gaillard, Alonso Silva

TL;DR

The paper tackles the challenge of comparing time-to-event prediction methods under right-censoring by evaluating six approaches (Cox PH, Weibull AFT, Aalen additive, Random Survival Forest, DeepSurv, Gradient Boosting Cox) across three datasets using concordance index and integrated Brier score. It introduces an ensemble strategy—a convex combination of the six predictors—optimized via gradient descent to minimize IBS, and demonstrates that this ensemble achieves robust improvements across datasets. Results show that rankings are highly dataset- and metric-dependent, but the ensemble generally enhances predictive accuracy and calibration, with an average IBS gain of about 3.4% over the best single predictor. These findings provide practical guidance for model selection in censored survival data and highlight the benefits of ensemble methods for robust performance across diverse survival tasks.

Abstract

Time-to-event analysis is a branch of statistics that has increased in popularity during the last decades due to its many application fields, such as predictive maintenance, customer churn prediction and population lifetime estimation. In this paper, we review and compare the performance of several prediction models for time-to-event analysis. These consist of semi-parametric and parametric statistical models, in addition to machine learning approaches. Our study is carried out on three datasets and evaluated in two different scores (the integrated Brier score and concordance index). Moreover, we show how ensemble methods, which surprisingly have not yet been much studied in time-to-event analysis, can improve the prediction accuracy and enhance the robustness of the prediction performance. We conclude the analysis with a simulation experiment in which we evaluate the factors influencing the performance ranking of the methods using both scores.

Experimental Comparison of Ensemble Methods and Time-to-Event Analysis Models Through Integrated Brier Score and Concordance Index

TL;DR

The paper tackles the challenge of comparing time-to-event prediction methods under right-censoring by evaluating six approaches (Cox PH, Weibull AFT, Aalen additive, Random Survival Forest, DeepSurv, Gradient Boosting Cox) across three datasets using concordance index and integrated Brier score. It introduces an ensemble strategy—a convex combination of the six predictors—optimized via gradient descent to minimize IBS, and demonstrates that this ensemble achieves robust improvements across datasets. Results show that rankings are highly dataset- and metric-dependent, but the ensemble generally enhances predictive accuracy and calibration, with an average IBS gain of about 3.4% over the best single predictor. These findings provide practical guidance for model selection in censored survival data and highlight the benefits of ensemble methods for robust performance across diverse survival tasks.

Abstract

Time-to-event analysis is a branch of statistics that has increased in popularity during the last decades due to its many application fields, such as predictive maintenance, customer churn prediction and population lifetime estimation. In this paper, we review and compare the performance of several prediction models for time-to-event analysis. These consist of semi-parametric and parametric statistical models, in addition to machine learning approaches. Our study is carried out on three datasets and evaluated in two different scores (the integrated Brier score and concordance index). Moreover, we show how ensemble methods, which surprisingly have not yet been much studied in time-to-event analysis, can improve the prediction accuracy and enhance the robustness of the prediction performance. We conclude the analysis with a simulation experiment in which we evaluate the factors influencing the performance ranking of the methods using both scores.
Paper Structure (32 sections, 17 equations, 19 figures, 1 table, 1 algorithm)

This paper contains 32 sections, 17 equations, 19 figures, 1 table, 1 algorithm.

Figures (19)

  • Figure 1:
  • Figure 2: Concordance index comparison of telecom churn dataset
  • Figure 3:
  • Figure 4: Integrated Brier score comparison of Telecom churn dataset
  • Figure 5:
  • ...and 14 more figures