Interpretable Prediction and Feature Selection for Survival Analysis

Mike Van Ness; Madeleine Udell

Interpretable Prediction and Feature Selection for Survival Analysis

Mike Van Ness, Madeleine Udell

TL;DR

DyS addresses the need for interpretable survival analysis in large, high-dimensional datasets. It combines a generalized additive model with interactions and neural shape functions in a neural additive model framework, trained with a discrete-time Ranked Probability Score loss to optimize survival predictions directly. It integrates feature selection via smooth-step gates and supports a two-stage fitting procedure to scale to large datasets while preserving interpretability through time-specific feature importances and impact plots. Across synthetic data, benchmark datasets, and a large heart-failure cohort, DyS achieves competitive discrimination with intrinsic interpretability, enabling both prediction and feature selection in a glass-box survival model.

Abstract

Survival analysis is widely used as a technique to model time-to-event data when some data is censored, particularly in healthcare for predicting future patient risk. In such settings, survival models must be both accurate and interpretable so that users (such as doctors) can trust the model and understand model predictions. While most literature focuses on discrimination, interpretability is equally as important. A successful interpretable model should be able to describe how changing each feature impacts the outcome, and should only use a small number of features. In this paper, we present DyS (pronounced ``dice''), a new survival analysis model that achieves both strong discrimination and interpretability. DyS is a feature-sparse Generalized Additive Model, combining feature selection and interpretable prediction into one model. While DyS works well for all survival analysis problems, it is particularly useful for large (in $n$ and $p$) survival datasets such as those commonly found in observational healthcare studies. Empirical studies show that DyS competes with other state-of-the-art machine learning models for survival analysis, while being highly interpretable.

Interpretable Prediction and Feature Selection for Survival Analysis

TL;DR

Abstract

and

) survival datasets such as those commonly found in observational healthcare studies. Empirical studies show that DyS competes with other state-of-the-art machine learning models for survival analysis, while being highly interpretable.

Paper Structure (30 sections, 13 equations, 3 figures, 4 tables, 3 algorithms)

This paper contains 30 sections, 13 equations, 3 figures, 4 tables, 3 algorithms.

Introduction
Example Usage
Background
Survival Analysis
Interpretable Machine Learning
Feature Selection
Methodology
Model Architecture
Interpretation Plots
Loss Function
Feature Sparsity
Preset Feature Budget
Two-Stage Fitting
Related Work
Experiments
...and 15 more sections

Figures (3)

Figure 1: Interpretable plots generated by DyS trained on heart failure data, across 10 trials. (Top left) feature importances averaged across evaluation times. (Right) feature impact plots for individual features at two evaluation times: 1 year and 3 years. (Bottom left) feature impact plots for interactions at 1 year. These plots fully describe the behavior of the fitted DyS model without any extra processing due to DyS's glass-box structure.
Figure 2: Summary of the architecture of DyS. For simplicity, the interaction effects are not shown. When feature sparsity is desired, the $\mu_j$ parameters are learned such that a subset of $s(\mu_j), j = 1, \ldots, p$ are equal to 0, preventing the corresponding features from influencing the predictions.
Figure 3: Performance of DyS (with RPS loss) versus s CoxDyS, i.e. DyS using the CoxPH loss, on synthetic data which fails the proportional hazards assumption. (Left) Time-dependent AUC measured as several evaluation times, with dotted lines representing mean AUC. Using the CoxPH loss results in poor performance for smaller evaluation times. (Right) Shape functions for feature 1 under different loss functions. For RPS loss (bottom), shape function is shown at four different evaluation times, since DyS outputs are time-dependent.

Interpretable Prediction and Feature Selection for Survival Analysis

TL;DR

Abstract

Interpretable Prediction and Feature Selection for Survival Analysis

Authors

TL;DR

Abstract

Table of Contents

Figures (3)