Table of Contents
Fetching ...

Interpretable Machine Learning for Survival Analysis

Sophie Hanna Langbein, Mateusz Krzyziński, Mikołaj Spytek, Hubert Baniecki, Przemysław Biecek, Marvin N. Wright

TL;DR

This paper surveys interpretable machine learning methods for survival analysis, addressing the need for transparency in time-to-event predictions under censoring. It systematically maps local/global, model-agnostic/model-specific explanations to survival outputs, and formalizes adaptations of ICE, PDP, ALE, H-statistics, PFI, CPI, LOCO, SurvLIME, SurvSHAP, and SurvSHAP(t). A practical tutorial demonstrates these methods on the German Breast Cancer Study Group (GBSG2) data, comparing CoxPH and a random survival forest and illustrating how interpretability tools reveal time-varying effects and interactions. The work identifies limitations (e.g., extrapolation, computational demands, and non-causal interpretations) and outlines directions for future research, including more robust time-dependent explanations and broader data modalities. Overall, it provides a consolidated framework to understand, compare, and apply survival-specific IML methods for researchers and practitioners beyond traditional hazard-based interpretations.

Abstract

With the spread and rapid advancement of black box machine learning models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability and fairness in sensitive areas, such as clinical decision making processes, the development of targeted therapies, interventions or in other medical or healthcare related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred medical practitioners and policy makers in public health from leveraging the full potential of machine learning for predicting time-to-event data. We present a comprehensive review of the limited existing amount of work on IML methods for survival analysis within the context of the general IML taxonomy. In addition, we formally detail how commonly used IML methods, such as such as individual conditional expectation (ICE), partial dependence plots (PDP), accumulated local effects (ALE), different feature importance measures or Friedman's H-interaction statistics can be adapted to survival outcomes. An application of several IML methods to real data on data on under-5 year mortality of Ghanaian children from the Demographic and Health Surveys (DHS) Program serves as a tutorial or guide for researchers, on how to utilize the techniques in practice to facilitate understanding of model decisions or predictions.

Interpretable Machine Learning for Survival Analysis

TL;DR

This paper surveys interpretable machine learning methods for survival analysis, addressing the need for transparency in time-to-event predictions under censoring. It systematically maps local/global, model-agnostic/model-specific explanations to survival outputs, and formalizes adaptations of ICE, PDP, ALE, H-statistics, PFI, CPI, LOCO, SurvLIME, SurvSHAP, and SurvSHAP(t). A practical tutorial demonstrates these methods on the German Breast Cancer Study Group (GBSG2) data, comparing CoxPH and a random survival forest and illustrating how interpretability tools reveal time-varying effects and interactions. The work identifies limitations (e.g., extrapolation, computational demands, and non-causal interpretations) and outlines directions for future research, including more robust time-dependent explanations and broader data modalities. Overall, it provides a consolidated framework to understand, compare, and apply survival-specific IML methods for researchers and practitioners beyond traditional hazard-based interpretations.

Abstract

With the spread and rapid advancement of black box machine learning models, the field of interpretable machine learning (IML) or explainable artificial intelligence (XAI) has become increasingly important over the last decade. This is particularly relevant for survival analysis, where the adoption of IML techniques promotes transparency, accountability and fairness in sensitive areas, such as clinical decision making processes, the development of targeted therapies, interventions or in other medical or healthcare related contexts. More specifically, explainability can uncover a survival model's potential biases and limitations and provide more mathematically sound ways to understand how and which features are influential for prediction or constitute risk factors. However, the lack of readily available IML methods may have deterred medical practitioners and policy makers in public health from leveraging the full potential of machine learning for predicting time-to-event data. We present a comprehensive review of the limited existing amount of work on IML methods for survival analysis within the context of the general IML taxonomy. In addition, we formally detail how commonly used IML methods, such as such as individual conditional expectation (ICE), partial dependence plots (PDP), accumulated local effects (ALE), different feature importance measures or Friedman's H-interaction statistics can be adapted to survival outcomes. An application of several IML methods to real data on data on under-5 year mortality of Ghanaian children from the Demographic and Health Surveys (DHS) Program serves as a tutorial or guide for researchers, on how to utilize the techniques in practice to facilitate understanding of model decisions or predictions.
Paper Structure (36 sections, 34 equations, 19 figures, 2 tables, 2 algorithms)

This paper contains 36 sections, 34 equations, 19 figures, 2 tables, 2 algorithms.

Figures (19)

  • Figure 1: Overview over the interpretable machine learning methods reviewed and newly introduced in this paper in the context of the taxonomy popularized by Molnarmolnar2022 and Biecek and Burzykowskibiecek2021.
  • Figure 2: ICE curves for the coxph model (left) and the ranger model (right) for the time-dependent treatment feature. One ICE curve shows how the model's prediction varies over time for one observation for a fixed treatment strategy. The different line colors correspond to different treatment strategies (0 = no treatment, 1 = treatment). Therefore, each observation from the simulated test dataset is represented by one orange line (treatment = 0) and one blue line (treatment = 1). The rug on the x-axis shows the survival time distribution with the grey bars indicating observed survival times and the red bars indicating censoring.
  • Figure 3: c-ICE curves for the coxph model (left) and the ranger model (right) for the time-dependent treatment feature. One ICE curve shows how the model's prediction varies over time for one observation for a fixed treatment strategy relative to a no treatment strategy for each individual patient. As a result, the c-ICE curves are 0 constants for treatment = 0 for both models (for the ranger model, the orange curves are simply covered by the blue curves.)
  • Figure 4: Uncentered PDPs and ICE curves for the coxph model (left) and the ranger model (right) for the time-dependent treatment feature. ICE curves are depicted as thin colored lines, while PDPs are thick colored lines. The different line colors correspond to different treatment strategies (0 = no treatment, 1 = treatment). The PDPs depict the average predicted survival probability within one treatment arm over time. The rug on the x-axis shows the survival time distribution with the grey bars indicating observed survival times and the red bars indicating censoring.
  • Figure 5: PDPs for the ranger model for feature $\texttt{x}_1$ (left) and feature $\texttt{x}_2$ (right). The different line colors correspond to different survival times, while the feature values are shown on the x-axes. Contrary to the ground truth, the PDPs suggest that the average predicted survival probability barely differs for different feature values, for both feature $\texttt{x}_1$ and feature $\texttt{x}_2$.
  • ...and 14 more figures