An Introduction to Deep Survival Analysis Models for Predicting Time-to-Event Outcomes

George H. Chen

An Introduction to Deep Survival Analysis Models for Predicting Time-to-Event Outcomes

George H. Chen

TL;DR

A working understanding of precisely what the basic time-to-event prediction problem is, how it differs from standard regression and classification, and how key"design patterns" have been used time after time to derive new time-to-event prediction models are provided.

Abstract

Many applications involve reasoning about time durations before a critical event happens--also called time-to-event outcomes. When will a customer cancel a subscription, a coma patient wake up, or a convicted criminal reoffend? Time-to-event outcomes have been studied extensively within the field of survival analysis primarily by the statistical, medical, and reliability engineering communities, with textbooks already available in the 1970s and '80s. This monograph aims to provide a reasonably self-contained modern introduction to survival analysis. We focus on predicting time-to-event outcomes at the individual data point level with the help of neural networks. Our goal is to provide the reader with a working understanding of precisely what the basic time-to-event prediction problem is, how it differs from standard regression and classification, and how key "design patterns" have been used time after time to derive new time-to-event prediction models, from classical methods like the Cox proportional hazards model to modern deep learning approaches such as deep kernel Kaplan-Meier estimators and neural ordinary differential equation models. We further delve into two extensions of the basic time-to-event prediction setup: predicting which of several critical events will happen first along with the time until this earliest event happens (the competing risks setting), and predicting time-to-event outcomes given a time series that grows in length over time (the dynamic setting). We conclude with a discussion of a variety of topics such as fairness, causal reasoning, interpretability, and statistical guarantees. Our monograph comes with an accompanying code repository that implements every model and evaluation metric that we cover in detail.

An Introduction to Deep Survival Analysis Models for Predicting Time-to-Event Outcomes

TL;DR

Abstract

Paper Structure (131 sections, 210 equations, 6 figures, 2 tables)

This paper contains 131 sections, 210 equations, 6 figures, 2 tables.

Introduction
Survival Analysis and Time-to-Event Outcomes: Some History and Commentary on Naming
Machine Learning Models for Survival Analysis
The Motivation for This Monograph
Monograph Overview and Outline
Examples of Topics Beyond the Scope of Our Monograph
Preliminaries
Prerequisites
How We View Neural Networks
Notation
Software Packages and Datasets
Companion code repository
Basic Time-to-Event Prediction Setup
Standard Right-Censored Statistical Framework
Time-to-Event Prediction in Continuous Time
...and 116 more sections

Figures (6)

Figure 1: An overview of the models we cover in detail in Sections \ref{['chap:setup']} to \ref{['chap:ode']}. One model being the child of another means that the child model could be represented (possibly with a known approximation) by the parent model. Note that when interpreting this diagram, two non-overlapping models could still possibly represent the same underlying time-to-event outcome distribution. For example, deep extended hazard models zhong2021deep and survival kernets chen2024survival are capable of modeling many of the same time-to-event outcome distributions. Note that we also cover Cox-Time kvamme2019time, which does not easily fit in the diagram; Cox-Time is a generalization of the semiparametric model called DeepSurv faraggi1995neuralkatzman2018deepsurv, but Cox-Time can also represent models that are not deep extended hazard models.
Figure 2: Example of a survival function and its median and mean survival times.
Figure 3: The survival function $S(\cdot|x)$ from Figure \ref{['fig:example-survival-curve']} along with its hazard $h(\cdot|x)$ and cumulative hazard $H(\cdot|x)$ functions.
Figure 4: Under the proportional hazards assumption (equation (\ref{['eq:prop-hazard-assumption']})), possible survival functions are all powers of the baseline survival function $\mathbf{S}_0(\cdot;\theta)$ as shown in panel (a); note that we can always unambiguously order these functions based on the log partial hazard function $\mathbf{f}(\cdot;\theta)$. In contrast, the green curve shown in panel (b) is not possible under a proportional hazards model and is neither uniformly better nor uniformly worse than the baseline survival function.
Figure 5: (Figure source: chen2024survival) For the largest 5 clusters found by a survival kernet model on the SUPPORT dataset knaus1995support, these are the clusters' Kaplan-Meier survival function plots overlaid over each other.
...and 1 more figures

Theorems & Definitions (1)

proof : Proof of Proposition \ref{['prop:deep-aft-log-survival-time-viewpoint']}

An Introduction to Deep Survival Analysis Models for Predicting Time-to-Event Outcomes

TL;DR

Abstract

An Introduction to Deep Survival Analysis Models for Predicting Time-to-Event Outcomes

Authors

TL;DR

Abstract

Table of Contents

Figures (6)

Theorems & Definitions (1)