Multimodal Variational Autoencoder for Low-cost Cardiac Hemodynamics Instability Detection
Mohammod N. I. Suvon, Prasun C. Tripathi, Wenrui Fan, Shuo Zhou, Xianyuan Liu, Samer Alabed, Venet Osmani, Andrew J. Swift, Chen Chen, Haiping Lu
TL;DR
The study addresses CHDI detection by predicting $PAWP$ from inexpensive CXR and ECG data. It introduces CardioVAE_X,G, a multimodal variational autoencoder that uses a tri-stream pre-training strategy to learn both shared and modality-specific representations from a large unlabeled dataset (MIMIC) and then fine-tunes on a smaller labeled ASPIRE cohort ($50{,}982$ unlabeled pairs; $795$ labeled subjects). Results show competitive performance with an overall $AUROC$ of about $0.79$ and accuracy of about $0.77$, while unimodal analyses and interpretability via integrated gradients support clinical applicability. This approach demonstrates that combining low-cost modalities with unsupervised pre-training can approach MRI-based performance and yields interpretable insights to aid decision-making in critical care.
Abstract
Recent advancements in non-invasive detection of cardiac hemodynamic instability (CHDI) primarily focus on applying machine learning techniques to a single data modality, e.g. cardiac magnetic resonance imaging (MRI). Despite their potential, these approaches often fall short especially when the size of labeled patient data is limited, a common challenge in the medical domain. Furthermore, only a few studies have explored multimodal methods to study CHDI, which mostly rely on costly modalities such as cardiac MRI and echocardiogram. In response to these limitations, we propose a novel multimodal variational autoencoder ($\text{CardioVAE}_\text{X,G}$) to integrate low-cost chest X-ray (CXR) and electrocardiogram (ECG) modalities with pre-training on a large unlabeled dataset. Specifically, $\text{CardioVAE}_\text{X,G}$ introduces a novel tri-stream pre-training strategy to learn both shared and modality-specific features, thus enabling fine-tuning with both unimodal and multimodal datasets. We pre-train $\text{CardioVAE}_\text{X,G}$ on a large, unlabeled dataset of $50,982$ subjects from a subset of MIMIC database and then fine-tune the pre-trained model on a labeled dataset of $795$ subjects from the ASPIRE registry. Comprehensive evaluations against existing methods show that $\text{CardioVAE}_\text{X,G}$ offers promising performance (AUROC $=0.79$ and Accuracy $=0.77$), representing a significant step forward in non-invasive prediction of CHDI. Our model also excels in producing fine interpretations of predictions directly associated with clinical features, thereby supporting clinical decision-making.
