Table of Contents
Fetching ...

Going beyond explainability in multi-modal stroke outcome prediction models

Jonas Brändli, Maurice Schneeberger, Lisa Herzog, Loran Avci, Nordin Dari, Martin Häansel, Hakim Baazaoui, Pascal Bühler, Susanne Wegener, Beate Sick

TL;DR

This study addresses trustworthy, interpretable stroke outcome prediction by integrating imaging and tabular patient data via deep Transformation Models (dTMs). By adapting Grad-CAM and Occlusion to multi-modal dTMs, the authors produce explanation maps that highlight relevant brain regions and enable error analysis, while retaining interpretable parameters for tabular features as log-odds-like effects. The results show that tabular-plus-imaging dTMs achieve near 0.8 AUC, with interpretable beta coefficients confirming known risk factors such as pre-stroke functional dependence and NIHSS on admission. Overall, the approach balances high predictive performance with explanation capabilities, supporting clinical trust and enabling hypothesis generation about image-based predictors of stroke outcome.

Abstract

Aim: This study aims to enhance interpretability and explainability of multi-modal prediction models integrating imaging and tabular patient data. Methods: We adapt the xAI methods Grad-CAM and Occlusion to multi-modal, partly interpretable deep transformation models (dTMs). DTMs combine statistical and deep learning approaches to simultaneously achieve state-of-the-art prediction performance and interpretable parameter estimates, such as odds ratios for tabular features. Based on brain imaging and tabular data from 407 stroke patients, we trained dTMs to predict functional outcome three months after stroke. We evaluated the models using different discriminatory metrics. The adapted xAI methods were used to generated explanation maps for identification of relevant image features and error analysis. Results: The dTMs achieve state-of-the-art prediction performance, with area under the curve (AUC) values close to 0.8. The most important tabular predictors of functional outcome are functional independence before stroke and NIHSS on admission, a neurological score indicating stroke severity. Explanation maps calculated from brain imaging dTMs for functional outcome highlighted critical brain regions such as the frontal lobe, which is known to be linked to age which in turn increases the risk for unfavorable outcomes. Similarity plots of the explanation maps revealed distinct patterns which give insight into stroke pathophysiology, support developing novel predictors of stroke outcome and enable to identify false predictions. Conclusion: By adapting methods for explanation maps to dTMs, we enhanced the explainability of multi-modal and partly interpretable prediction models. The resulting explanation maps facilitate error analysis and support hypothesis generation regarding the significance of specific image regions in outcome prediction.

Going beyond explainability in multi-modal stroke outcome prediction models

TL;DR

This study addresses trustworthy, interpretable stroke outcome prediction by integrating imaging and tabular patient data via deep Transformation Models (dTMs). By adapting Grad-CAM and Occlusion to multi-modal dTMs, the authors produce explanation maps that highlight relevant brain regions and enable error analysis, while retaining interpretable parameters for tabular features as log-odds-like effects. The results show that tabular-plus-imaging dTMs achieve near 0.8 AUC, with interpretable beta coefficients confirming known risk factors such as pre-stroke functional dependence and NIHSS on admission. Overall, the approach balances high predictive performance with explanation capabilities, supporting clinical trust and enabling hypothesis generation about image-based predictors of stroke outcome.

Abstract

Aim: This study aims to enhance interpretability and explainability of multi-modal prediction models integrating imaging and tabular patient data. Methods: We adapt the xAI methods Grad-CAM and Occlusion to multi-modal, partly interpretable deep transformation models (dTMs). DTMs combine statistical and deep learning approaches to simultaneously achieve state-of-the-art prediction performance and interpretable parameter estimates, such as odds ratios for tabular features. Based on brain imaging and tabular data from 407 stroke patients, we trained dTMs to predict functional outcome three months after stroke. We evaluated the models using different discriminatory metrics. The adapted xAI methods were used to generated explanation maps for identification of relevant image features and error analysis. Results: The dTMs achieve state-of-the-art prediction performance, with area under the curve (AUC) values close to 0.8. The most important tabular predictors of functional outcome are functional independence before stroke and NIHSS on admission, a neurological score indicating stroke severity. Explanation maps calculated from brain imaging dTMs for functional outcome highlighted critical brain regions such as the frontal lobe, which is known to be linked to age which in turn increases the risk for unfavorable outcomes. Similarity plots of the explanation maps revealed distinct patterns which give insight into stroke pathophysiology, support developing novel predictors of stroke outcome and enable to identify false predictions. Conclusion: By adapting methods for explanation maps to dTMs, we enhanced the explainability of multi-modal and partly interpretable prediction models. The resulting explanation maps facilitate error analysis and support hypothesis generation regarding the significance of specific image regions in outcome prediction.

Paper Structure

This paper contains 16 sections, 5 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Examples of images overlayed with explanation maps resulting from the xAI methods Occlusion (upper row) and Grad-CAM (lower row). The images on the left are classified as "African elephant" with a VGG16 architecture trained on Imagenet simonyan2014very. The images on the right represent an average image over a brain volume of a stroke patient predicted having an "unfavorable" outcome using a dTM (CIB model) trained on diffusion weighted images. The colors in the legend highlight the importance of the image region for the respective prediction (larger values indicate higher relevance).
  • Figure 2: Deep transformation model training and evaluation setup. We perform a 10-fold cross-validation by repeating the following steps for each split. We provide tabular and imaging data as input to the NNs of the dTM. The NNs are trained jointly by minimizing the NLL to learn the parameters of the transformation function $h_m$. For binary outcomes, $h_m$ represents a cutpoint to separate the standard logistic distribution with $F_Z(z)=\sigma(z)$, yielding a probability for favorable $p_{0} = \sigma(h_m(y_0|\textsf{B},x))$ vs. unfavorable ($1-p_0$) outcome (right panel). Models with imaging data are fitted $m=5$ times on the training data of the respective split, each time with a new random initialization of NN weights. These five models are averaged to an ensemble dTM by calculating a weighted average across the five transformation functions $\bar{h}^{\textsf{wgt}}$.
  • Figure 3: Overview of the Grad-CAM and Occlusion method adapted to dTMs. The trained CNN builds the basis for both methods. Grad-CAM uses the gradient information flowing back from the predicted class to the last convolutional layer to determine input feature importance. Occlusion systematically covers parts of the input to then highlight the changes in predicted probability for a respective class.
  • Figure 4: Average explanation maps. The figure shows the average across the explanation maps for patients predicted having a favorable or unfavorable outcome. The explanation maps correspond to the CIB (left column) and CIB-LSX (right column) model when applying the Occlusion (upper row) or Grad-CAM (lower row) algorithm.
  • Figure 5: Similarity map. The figure shows a t-SNE plot for CIB - LSX Grad-CAM explanation maps to indicate classes of brain regions for patients with correctly and wrongly predicted favorable and unfavorable outcome. The similarity map is obtained with t-SNE based on feature vectors extracted from a pretrained ResNet-50 architecture trained on Imagenet.
  • ...and 1 more figures