Interpretable machine learning for time-to-event prediction in medicine and healthcare

Hubert Baniecki; Bartlomiej Sobieski; Patryk Szatkowski; Przemyslaw Bombinski; Przemyslaw Biecek

Interpretable machine learning for time-to-event prediction in medicine and healthcare

Hubert Baniecki, Bartlomiej Sobieski, Patryk Szatkowski, Przemyslaw Bombinski, Przemyslaw Biecek

TL;DR

It is shown how post-hoc interpretation methods allow for finding biases in AI systems predicting length of stay using a novel multi-modal dataset created from 1235 X-ray images with textual radiology reports annotated by human experts.

Abstract

Time-to-event prediction, e.g. cancer survival analysis or hospital length of stay, is a highly prominent machine learning task in medical and healthcare applications. However, only a few interpretable machine learning methods comply with its challenges. To facilitate a comprehensive explanatory analysis of survival models, we formally introduce time-dependent feature effects and global feature importance explanations. We show how post-hoc interpretation methods allow for finding biases in AI systems predicting length of stay using a novel multi-modal dataset created from 1235 X-ray images with textual radiology reports annotated by human experts. Moreover, we evaluate cancer survival models beyond predictive performance to include the importance of multi-omics feature groups based on a large-scale benchmark comprising 11 datasets from The Cancer Genome Atlas (TCGA). Model developers can use the proposed methods to debug and improve machine learning algorithms, while physicians can discover disease biomarkers and assess their significance. We hope the contributed open data and code resources facilitate future work in the emerging research direction of explainable survival analysis.

Interpretable machine learning for time-to-event prediction in medicine and healthcare

TL;DR

Abstract

Paper Structure (33 sections, 6 equations, 9 figures, 4 tables)

This paper contains 33 sections, 6 equations, 9 figures, 4 tables.

Introduction
Related work
Interpretable machine learning for survival analysis
Predicting and explaining hospital length of stay
Methods
Time-dependent feature effects
Time-dependent global feature importance
Materials: data and machine learning models
Bias in predicting hospital LoS using X-ray images
The tlos dataset
Details of human annotation
Models
Explainable multi-omics for cancer survival prediction
The TCGA benchmark
Models
...and 18 more sections

Figures (9)

Figure 1: Post-hoc explanation methods allow for finding biases in machine learning models predicting hospital length of stay, and evaluating cancer survival models beyond performance to include the importance of multi-omics feature groups.
Figure 2: Schematic workflow of creating the tlos dataset.
Figure 3: Exemplary X-ray images of (left) lung disease in an adult patient and (right) healthy children's lungs with visible medical devices.
Figure 4: Complementary time-dependent explanations of the GBDT model trained on age, sex, and human-annotated radiomics features. Top left: SurvSHAP(t) local explanation for a selected patient informing about the 6 most important features and their effect on the predicted LoS on each day since X-ray examination. Top right: What-if analysis for the same patient and the selected ambiguous feature, informing about how the predicted LoS would change upon the change in feature value. Bottom left: Feature importance global explanation based on aggregated SurvSHAP(t) values for a subset of patients. It informs about the 6 most important features overall. Bottom right: Partial dependence global explanation for the most important feature informing about its effect on the predicted LoS.
Figure 5: Time-dependent permutation importance of feature groups representing different omics modalities. IBS metric values are in brackets. Each subplot comprises distinct results from a cross-validation of a random survival forest predicting the survival of a particular cancer type from TCGA. Positive importance values over time indicate a high influence of features on the accuracy of time-to-event prediction. We observe that clinical features are the most important for predicting BLCA, BRCA, HNSC, LUAD, OV, and PAAD; CNV features influence the prediction in LUSC and OV; miRNA features are important in KIRC, LGG and LUSC; while RNA features are useful in predicting KIRC and SKCM. In some tasks, particular modalities add noise to the modelling process, as shown by negative importance values over time. For example, a random survival forest predicting PAAD performs the worst on average, practically random. It is indicated by the integrated Brier score of $0.23\pm0.06$, as well as in the time-dependent explanation showing most of the features as unimportant noise.
...and 4 more figures

Interpretable machine learning for time-to-event prediction in medicine and healthcare

TL;DR

Abstract

Interpretable machine learning for time-to-event prediction in medicine and healthcare

Authors

TL;DR

Abstract

Table of Contents

Figures (9)