Table of Contents
Fetching ...

FPBoost: Fully Parametric Gradient Boosting for Survival Analysis

Alberto Archetti, Eugenio Lomurno, Diego Piccinotti, Matteo Matteucci

TL;DR

FPBoost addresses limitations of traditional survival models by modeling the instantaneous risk as a weighted sum of fully parametric hazards and optimizing the full survival likelihood rather than relying on partial likelihood or discretization. By using gradient-boosted trees to estimate head parameters $(\eta_j, k_j)$ and weights $w_j$ for each head, FPBoost achieves flexible, interpretable hazard shapes formed from Weibull and LogLogistic components, with nonnegativity enforced via $\text{ReLU}$ and final hazard clipping as needed. The paper proves a universal hazard approximation property for mixtures of Weibull heads and demonstrates strong empirical performance across diverse right-censored datasets, delivering competitive concordance and calibration relative to both tree-based and neural-network survival models, with an open-source implementation compatible with scikit-survival.

Abstract

Survival analysis is a statistical framework for modeling time-to-event data. It plays a pivotal role in medicine, reliability engineering, and social science research, where understanding event dynamics even with few data samples is critical. Recent advancements in machine learning, particularly those employing neural networks and decision trees, have introduced sophisticated algorithms for survival modeling. However, many of these methods rely on restrictive assumptions about the underlying event-time distribution, such as proportional hazard, time discretization, or accelerated failure time. In this study, we propose FPBoost, a survival model that combines a weighted sum of fully parametric hazard functions with gradient boosting. Distribution parameters are estimated with decision trees trained by maximizing the full survival likelihood. We show how FPBoost is a universal approximator of hazard functions, offering full event-time modeling flexibility while maintaining interpretability through the use of well-established parametric distributions. We evaluate concordance and calibration of FPBoost across multiple benchmark datasets, showcasing its robustness and versatility as a new tool for survival estimation.

FPBoost: Fully Parametric Gradient Boosting for Survival Analysis

TL;DR

FPBoost addresses limitations of traditional survival models by modeling the instantaneous risk as a weighted sum of fully parametric hazards and optimizing the full survival likelihood rather than relying on partial likelihood or discretization. By using gradient-boosted trees to estimate head parameters and weights for each head, FPBoost achieves flexible, interpretable hazard shapes formed from Weibull and LogLogistic components, with nonnegativity enforced via and final hazard clipping as needed. The paper proves a universal hazard approximation property for mixtures of Weibull heads and demonstrates strong empirical performance across diverse right-censored datasets, delivering competitive concordance and calibration relative to both tree-based and neural-network survival models, with an open-source implementation compatible with scikit-survival.

Abstract

Survival analysis is a statistical framework for modeling time-to-event data. It plays a pivotal role in medicine, reliability engineering, and social science research, where understanding event dynamics even with few data samples is critical. Recent advancements in machine learning, particularly those employing neural networks and decision trees, have introduced sophisticated algorithms for survival modeling. However, many of these methods rely on restrictive assumptions about the underlying event-time distribution, such as proportional hazard, time discretization, or accelerated failure time. In this study, we propose FPBoost, a survival model that combines a weighted sum of fully parametric hazard functions with gradient boosting. Distribution parameters are estimated with decision trees trained by maximizing the full survival likelihood. We show how FPBoost is a universal approximator of hazard functions, offering full event-time modeling flexibility while maintaining interpretability through the use of well-established parametric distributions. We evaluate concordance and calibration of FPBoost across multiple benchmark datasets, showcasing its robustness and versatility as a new tool for survival estimation.
Paper Structure (22 sections, 2 theorems, 20 equations, 2 figures, 13 tables, 1 algorithm)

This paper contains 22 sections, 2 theorems, 20 equations, 2 figures, 13 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $\mathcal{H}$ denote the space of hazard functions, that is, continuous nonnegative real functions $h(t)$ for which $\int_0^\infty h(t)\,dt=\infty$. For any $h^\star \in \mathcal{H}$, any $\varepsilon > 0$, and any interval $[0, T]$, there exists a finite collection of $J$ Weibull hazard functio

Figures (2)

  • Figure 1: FPBoost architecture example with four heads. A set of trees estimates two distribution parameters, $\eta_j$ and $k_j$, for each of four heads starting from the input features. Heads 1 (blue) and 2 (green) follow Weibull distributions, while heads 3 (orange) and 4 (yellow) follow LogLogistic distributions. An additional set of trees (gray) estimates a weight for each head. These heads are combined to form a single hazard function and its corresponding cumulative hazard function. New trees are built by fitting the gradient of the negative log-likelihood and ElasticNet (purple).
  • Figure 2: Kaplan-Meier estimations (blue) on survival probability and censoring probability (orange) for the datasets included in the study.

Theorems & Definitions (3)

  • Theorem 3.1
  • Theorem 1.1
  • proof : Proof