Optimal Sparse Survival Trees

Rui Zhang; Rui Xin; Margo Seltzer; Cynthia Rudin

Optimal Sparse Survival Trees

Rui Zhang, Rui Xin, Margo Seltzer, Cynthia Rudin

TL;DR

This work tackles time-to-event prediction with censoring by delivering OSST, a provably optimal sparse survival tree method. It introduces a dynamic-programming-with-bounds framework that optimizes a regularized Integrated Brier Score objective $R(t,\mathbf{X},\mathbf{c},\mathbf{y})=\mathcal{L}(t,\mathbf{X},\mathbf{c},\mathbf{y})+\lambda H_t$ while pruning the search with hierarchical bounds, Equivalent Points, and a reference-model guessing bound. Experiments on 17 datasets show OSST achieves superior IBS ratios, good generalization (via cross-validation), and stronger overall quality (C-index and AUC) compared with interpretable baselines, while producing substantially sparser trees. The results demonstrate practical, scalable, and interpretable survival modeling suitable for high-stakes domains, with public code available for reproducibility and extension.

Abstract

Interpretability is crucial for doctors, hospitals, pharmaceutical companies and biotechnology corporations to analyze and make decisions for high stakes problems that involve human health. Tree-based methods have been widely adopted for survival analysis due to their appealing interpretablility and their ability to capture complex relationships. However, most existing methods to produce survival trees rely on heuristic (or greedy) algorithms, which risk producing sub-optimal models. We present a dynamic-programming-with-bounds approach that finds provably-optimal sparse survival tree models, frequently in only a few seconds.

Optimal Sparse Survival Trees

TL;DR

while pruning the search with hierarchical bounds, Equivalent Points, and a reference-model guessing bound. Experiments on 17 datasets show OSST achieves superior IBS ratios, good generalization (via cross-validation), and stronger overall quality (C-index and AUC) compared with interpretable baselines, while producing substantially sparser trees. The results demonstrate practical, scalable, and interpretable survival modeling suitable for high-stakes domains, with public code available for reproducibility and extension.

Abstract

Paper Structure (48 sections, 15 theorems, 70 equations, 39 figures, 7 tables, 1 algorithm)

This paper contains 48 sections, 15 theorems, 70 equations, 39 figures, 7 tables, 1 algorithm.

INTRODUCTION
METHODOLOGY
Notation and Objective
Dynamic Programming
Bounds
Lower Bounds
EXPERIMENTS
Optimality
Generalization
Comprehesive Quality
Running Time
Scalability
Optimal Survival Trees
LIMITATIONS
CONCLUSION
...and 33 more sections

Key Result

Theorem 2.1

The loss of a survival tree is an additive function of the observations and leaves.

Figures (39)

Figure 1: Training Score (IBS Ratio) of CTree, RPART, SkSurv and OSST on datasets: aids_death, uissurv, veterans, max depth 5.
Figure 2: 5-fold cross validation of CTree, RPART, SkSurv and OSST on datasets: credit, employee, max depth 5.
Figure 3: Testing performance of CTree, RPART, SkSurv and OSST on churn dataset, max depth 5, using different metrics. Cross-validation was used for confidence intervals.
Figure 4: Training time of CTree, RPART, SkSurv and OSST as a function of sample size on household dataset, $d=5,\lambda=0.01$ (60-minutes time limit).
Figure 5: Optimal survival tree produced by OSST for churn dataset, 7 leaves. IBS ratio: $48.68\%$
...and 34 more figures

Theorems & Definitions (30)

Theorem 2.1
Theorem 2.2
Theorem 2.3
Lemma 2.4
Lemma 2.5
Theorem 2.6
Theorem 2.7
Corollary 2.7.1
proof
proof
...and 20 more

Optimal Sparse Survival Trees

TL;DR

Abstract

Optimal Sparse Survival Trees

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (39)

Theorems & Definitions (30)