Interpretable Multimodal Learning for Cardiovascular Hemodynamics Assessment

Prasun C Tripathi; Sina Tabakhi; Mohammod N I Suvon; Lawrence Schöb; Samer Alabed; Andrew J Swift; Shuo Zhou; Haiping Lu

Interpretable Multimodal Learning for Cardiovascular Hemodynamics Assessment

Prasun C Tripathi, Sina Tabakhi, Mohammod N I Suvon, Lawrence Schöb, Samer Alabed, Andrew J Swift, Shuo Zhou, Haiping Lu

TL;DR

This work tackles non-invasive PAWP assessment by integrating spatio-temporal CMR features with critical EHR information in an interpretable multimodal framework. Using tensor-based MPCA for CMR, a Graph Attention Network for EHR feature selection, and four linear fusion strategies, the approach yields a linear, explainable classifier (SVM) that outperforms baselines on the ASPIRE registry (2,641 subjects). Tri-modal fusion with a hybrid strategy achieves the highest predictive performance (AUROC up to 0.8682, MCC up to 0.5492) and demonstrates clear clinical utility via Decision Curve Analysis across relevant thresholds. The results emphasize the value of combining imaging and clinical data for scalable screening, with interpretable feature insights highlighting key CMR regions and EHR measurements driving PAWP predictions.

Abstract

Pulmonary Arterial Wedge Pressure (PAWP) is an essential cardiovascular hemodynamics marker to detect heart failure. In clinical practice, Right Heart Catheterization is considered a gold standard for assessing cardiac hemodynamics while non-invasive methods are often needed to screen high-risk patients from a large population. In this paper, we propose a multimodal learning pipeline to predict PAWP marker. We utilize complementary information from Cardiac Magnetic Resonance Imaging (CMR) scans (short-axis and four-chamber) and Electronic Health Records (EHRs). We extract spatio-temporal features from CMR scans using tensor-based learning. We propose a graph attention network to select important EHR features for prediction, where we model subjects as graph nodes and feature relationships as graph edges using the attention mechanism. We design four feature fusion strategies: early, intermediate, late, and hybrid fusion. With a linear classifier and linear fusion strategies, our pipeline is interpretable. We validate our pipeline on a large dataset of $2,641$ subjects from our ASPIRE registry. The comparative study against state-of-the-art methods confirms the superiority of our pipeline. The decision curve analysis further validates that our pipeline can be applied to screen a large population. The code is available at https://github.com/prasunc/hemodynamics.

Interpretable Multimodal Learning for Cardiovascular Hemodynamics Assessment

TL;DR

Abstract

subjects from our ASPIRE registry. The comparative study against state-of-the-art methods confirms the superiority of our pipeline. The decision curve analysis further validates that our pipeline can be applied to screen a large population. The code is available at https://github.com/prasunc/hemodynamics.

Paper Structure (22 sections, 4 equations, 8 figures, 4 tables)

This paper contains 22 sections, 4 equations, 8 figures, 4 tables.

Introduction
Material and methods
Dataset
Preprocessing
Training Sample Selection
CMR Feature Extraction
EHR Feature Selection with Graph Attention Network
Multimodal Feature Fusion
Results and Analysis
Experimental Design
Uncertainity-based Filtering
Results of Unimodal Study
Results of Bi-modal Study
Results of Tri-modal Study
Ablation Studies
...and 7 more sections

Figures (8)

Figure 1: The proposed multimodal pipeline for PAWP prediction utilizing features from short-axis CMR, four-chamber CMR, and EHR. Step 1: landmarks in CMRs are localized using Ensemble Maximum Heatmap Activation (E-MHA) strategy schobs2022uncertainty and CMRs are aligned to a common image space. Step 2: quality training samples are selected based on uncertainity scores. Step 3: spatio-temporal CMR features are extracted using MPCA lu2008mpca. Step 4: EHR features are selected using graph attention network. Step 5: the features are fused using early, intermediate, late, or hybrid fusion strategies. Then, the prediction is performed using linear Support Vector Machine (SVM).
Figure 2: EHR feature selection based on GAT and the ablation approach wang2021mogonet. Top: Input EHR data and the corresponding constructed graph. Bottom: A single attention layer in GAT with two attention heads applied to EHR nodes in three steps. In step 1, nodes are linearly transformed for high-level feature embedding. In step 2, the attention mechanism utilizes attention vectors to compute normalized attention coefficients of the attention matrix $\mathbf{A}^{\left(l\right)}$ on these embeddings. In step 3, these coefficients are used to perform a linear combination of node embeddings, resulting in the final node embeddings for the next attention layer.
Figure 3: Four types of fusion methods utilized in our pipeline.
Figure 4: The validation performance on removing different numbers of bins from training data.
Figure 5: Ablation studies: (a) and (d) Unimodal (FC), Bi-modal (SA $\&$ FC), and Tri-modal (SA, FC, $\&$ EHR) are the best performing MPCA models in Tables \ref{['tab2']}, \ref{['tab3']}, and \ref{['tab4']}; (b) Bi-modal (FC $\&$ EHR), Bi-modal (FC $\&$ CM), Bi-modal (SA $\&$ EHR), and Bi-modal (SA $\&$ CM) are late fusion-based MPCA models; (c) Tri-modal (SA, FC, $\&$ EHR) and Tri-modal (SA, FC, $\&$ CM) are hybrid fusion-based MPCA models.
...and 3 more figures

Interpretable Multimodal Learning for Cardiovascular Hemodynamics Assessment

TL;DR

Abstract

Interpretable Multimodal Learning for Cardiovascular Hemodynamics Assessment

Authors

TL;DR

Abstract

Table of Contents

Figures (8)