Feature importance to explain multimodal prediction models. A clinical use case
Jorn-Jan van de Beld, Shreyasi Pathak, Jeroen Geerdink, Johannes H. Hegeman, Christin Seifert
TL;DR
The paper addresses predicting 30-day mortality after hip fracture surgery by leveraging a multimodal deep-learning framework that integrates pre-operative static data, hip and chest images, and per-operative vital signs and medications. It introduces SHAP-based explanations and a novel Shapley value propagation method to provide local and global attributions across a sequence of unimodal models fused into a multimodal predictor. The main finding is that pre-operative data, especially static features, carry most predictive power, while per-operative data yield limited gains; nonetheless, SHAP-based explanations enable interpretable, modality- and feature-level insights for clinical decision-making. The work demonstrates the feasibility of explainable multimodal predictions in a clinical setting and highlights future directions for more robust fusion strategies and handling missing modalities to enhance generalizability and trustworthiness.
Abstract
Surgery to treat elderly hip fracture patients may cause complications that can lead to early mortality. An early warning system for complications could provoke clinicians to monitor high-risk patients more carefully and address potential complications early, or inform the patient. In this work, we develop a multimodal deep-learning model for post-operative mortality prediction using pre-operative and per-operative data from elderly hip fracture patients. Specifically, we include static patient data, hip and chest images before surgery in pre-operative data, vital signals, and medications administered during surgery in per-operative data. We extract features from image modalities using ResNet and from vital signals using LSTM. Explainable model outcomes are essential for clinical applicability, therefore we compute Shapley values to explain the predictions of our multimodal black box model. We find that i) Shapley values can be used to estimate the relative contribution of each modality both locally and globally, and ii) a modified version of the chain rule can be used to propagate Shapley values through a sequence of models supporting interpretable local explanations. Our findings imply that a multimodal combination of black box models can be explained by propagating Shapley values through the model sequence.
