Table of Contents
Fetching ...

Soft decision trees for survival analysis

Antonio Consolo, Edoardo Amaldi, Emilio Carrizosa

TL;DR

This work introduces Soft Survival Trees (SST), a globally optimized, differentiable model for survival analysis that uses soft multivariate splits to route each input to a single leaf. Each leaf provides a survival function $S_{oldsymbol{x}}(t)$ drawn from parametric or spline-based semiparametric models, enabling flexible yet interpretable leaf-level modeling. SSTs are trained via a node-based decomposition (NODEC-DR-SST) that combines leaf likelihoods with branch routing probabilities and includes a fairness penalty to promote group parity. Across 15 datasets, SSTs outperform three benchmark survival trees on discrimination and calibration metrics, while also offering enhanced interpretability through leaf-wise survival-function clusters and a pathway to fairness-aware modeling. This approach provides a practical framework for accurate, interpretable survival analysis with flexible distribution choices and potential extensions to sparsity and complex data types.

Abstract

Decision trees are popular in survival analysis for their interpretability and ability to model complex relationships. Survival trees, which predict the timing of singular events using censored historical data, are typically built through heuristic approaches. Recently, there has been growing interest in globally optimized trees, where the overall tree is trained by minimizing the error function over all its parameters. We propose a new soft survival tree model (SST), with a soft splitting rule at each branch node, trained via a nonlinear optimization formulation amenable to decomposition. Since SSTs provide for every input vector a specific survival function associated to a single leaf node, they satisfy the conditional computation property and inherit the related benefits. SST and the training formulation combine flexibility with interpretability: any smooth survival function (parametric, semiparametric, or nonparametric) estimated through maximum likelihood can be used, and each leaf node of an SST yields a cluster of distinct survival functions which are associated to the data points routed to it. Numerical experiments on 15 well-known datasets show that SSTs, with parametric and spline-based semiparametric survival functions, trained using an adaptation of the node-based decomposition algorithm proposed by Consolo et al. (2024) for soft regression trees, outperform three benchmark survival trees in terms of four widely-used discrimination and calibration measures. SSTs can also be extended to consider group fairness.

Soft decision trees for survival analysis

TL;DR

This work introduces Soft Survival Trees (SST), a globally optimized, differentiable model for survival analysis that uses soft multivariate splits to route each input to a single leaf. Each leaf provides a survival function drawn from parametric or spline-based semiparametric models, enabling flexible yet interpretable leaf-level modeling. SSTs are trained via a node-based decomposition (NODEC-DR-SST) that combines leaf likelihoods with branch routing probabilities and includes a fairness penalty to promote group parity. Across 15 datasets, SSTs outperform three benchmark survival trees on discrimination and calibration metrics, while also offering enhanced interpretability through leaf-wise survival-function clusters and a pathway to fairness-aware modeling. This approach provides a practical framework for accurate, interpretable survival analysis with flexible distribution choices and potential extensions to sparsity and complex data types.

Abstract

Decision trees are popular in survival analysis for their interpretability and ability to model complex relationships. Survival trees, which predict the timing of singular events using censored historical data, are typically built through heuristic approaches. Recently, there has been growing interest in globally optimized trees, where the overall tree is trained by minimizing the error function over all its parameters. We propose a new soft survival tree model (SST), with a soft splitting rule at each branch node, trained via a nonlinear optimization formulation amenable to decomposition. Since SSTs provide for every input vector a specific survival function associated to a single leaf node, they satisfy the conditional computation property and inherit the related benefits. SST and the training formulation combine flexibility with interpretability: any smooth survival function (parametric, semiparametric, or nonparametric) estimated through maximum likelihood can be used, and each leaf node of an SST yields a cluster of distinct survival functions which are associated to the data points routed to it. Numerical experiments on 15 well-known datasets show that SSTs, with parametric and spline-based semiparametric survival functions, trained using an adaptation of the node-based decomposition algorithm proposed by Consolo et al. (2024) for soft regression trees, outperform three benchmark survival trees in terms of four widely-used discrimination and calibration measures. SSTs can also be extended to consider group fairness.

Paper Structure

This paper contains 33 sections, 31 equations, 26 figures, 11 tables, 2 algorithms.

Figures (26)

  • Figure 1: A soft tree of depth $D=2$.
  • Figure 2: An example of soft survival tree with single leaf node prediction where the arrows indicate the HBP path for any input $\mathbf{x}$, and the corresponding predicted survival function is $\hat{S}_{\mathbf{x}}(t) = S_{\mathbf{x}}(t; \bm{\beta}_5)$.
  • Figure 3: Example of NODEC-DR-SST working set selection for a SST of depth $D=3$. The branch node $r_s=2$ is selected, along with the associated working sets $W_B=\{2,4,5\}$ (red) and $W_L = \{8,9,10,11\}$ (blue). The associated variable vectors $\bm{\omega}$ and $\bm{\beta}$ are indicated inside each node.
  • Figure 4: Boxplots of the testing $C_U$measure (the higher the better) for the 15 datasets and for the three best-performing approaches, namely, LLog in blue, CTree in orange, and PO in green.
  • Figure 5: Boxplots of the testing IBS measure (the lower the better) for the 15 datasets and for the the three best-performing approaches, namely, LLog in blue, CTree in orange, and PO in green.
  • ...and 21 more figures

Theorems & Definitions (3)

  • definition 1
  • definition 2
  • definition 3