Table of Contents
Fetching ...

Optimal Survival Trees: A Dynamic Programming Approach

Tim Huisman, Jacobus G. M. van der Linden, Emir Demirović

TL;DR

The paper tackles the problem of learning globally optimal survival trees under right-censoring. It introduces SurTree, a dynamic-programming framework that yields optimal trees for a given size while providing a special depth-two algorithm to dramatically boost scalability. Empirical results show SurTree achieves competitive out-of-sample performance compared with OST and outperforms CTree, with substantial runtime advantages in many settings. The approach enables direct assessment of optimality gaps for heuristic trees and lays groundwork for further enhancements, such as incorporating Cox models within leaves. Overall, SurTree advances interpretable survival analysis by delivering global optimality guarantees with scalable optimization.

Abstract

Survival analysis studies and predicts the time of death, or other singular unrepeated events, based on historical data, while the true time of death for some instances is unknown. Survival trees enable the discovery of complex nonlinear relations in a compact human comprehensible model, by recursively splitting the population and predicting a distinct survival distribution in each leaf node. We use dynamic programming to provide the first survival tree method with optimality guarantees, enabling the assessment of the optimality gap of heuristics. We improve the scalability of our method through a special algorithm for computing trees up to depth two. The experiments show that our method's run time even outperforms some heuristics for realistic cases while obtaining similar out-of-sample performance with the state-of-the-art.

Optimal Survival Trees: A Dynamic Programming Approach

TL;DR

The paper tackles the problem of learning globally optimal survival trees under right-censoring. It introduces SurTree, a dynamic-programming framework that yields optimal trees for a given size while providing a special depth-two algorithm to dramatically boost scalability. Empirical results show SurTree achieves competitive out-of-sample performance compared with OST and outperforms CTree, with substantial runtime advantages in many settings. The approach enables direct assessment of optimality gaps for heuristic trees and lays groundwork for further enhancements, such as incorporating Cox models within leaves. Overall, SurTree advances interpretable survival analysis by delivering global optimality guarantees with scalable optimization.

Abstract

Survival analysis studies and predicts the time of death, or other singular unrepeated events, based on historical data, while the true time of death for some instances is unknown. Survival trees enable the discovery of complex nonlinear relations in a compact human comprehensible model, by recursively splitting the population and predicting a distinct survival distribution in each leaf node. We use dynamic programming to provide the first survival tree method with optimality guarantees, enabling the assessment of the optimality gap of heuristics. We improve the scalability of our method through a special algorithm for computing trees up to depth two. The experiments show that our method's run time even outperforms some heuristics for realistic cases while obtaining similar out-of-sample performance with the state-of-the-art.
Paper Structure (33 sections, 26 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 33 sections, 26 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: An example of a survival tree. Each leaf has a different survival distribution.
  • Figure 2: A visualization of how $\theta$ affects a survival distribution $\hat{S}(t)$. Every plot uses the same $\hat{\Lambda}(t)$, but use $\theta = 0.5$, $\theta = 1$ and $\theta = 2$ respectively.
  • Figure 3: Harrell's C-index and the integrated Brier score on the synthetic data sets (except time-outs).
  • Figure 4: Run time performance for increasing depth, for 3 continuous, 1 binary, and 2 categorical features ($f=1$) or 6 continuous, 2 binary, and 4 categorical features ($f=2$).
  • Figure 5: Normalized training loss for CTree, OST, and SurTree, when trained with binarized data.