Table of Contents
Fetching ...

Towards Precision Healthcare: Robust Fusion of Time Series and Image Data

Ali Rasekh, Reza Heidari, Amir Hosein Haji Mohammad Rezaie, Parsa Sharifi Sedeh, Zahra Ahmadi, Prasenjit Mitra, Wolfgang Nejdl

TL;DR

This work tackles mortality prediction and phenotyping from multimodal clinical data by fusing time-series and chest X-ray information. It introduces a dual-branch architecture with modality-specific encoders (ResNet-34 for images and a 2-layer LSTM for time-series) whose embeddings are projected and fused via a Transformer encoder, using no positional embeddings to avoid modality bias. A Kendall-style multi-task uncertainty loss weights the 25 phenotyping tasks, enabling robust multi-label learning and improved calibration, while CLAHE augmentation and attention fusion boost performance and resilience to noise. Empirical results on the MIMIC-IV and MIMIC-CXR dataset demonstrate state-of-the-art performance for mortality prediction and phenotyping, with notable robustness in noisy settings and improved uncertainty handling. These findings advance robust multimodal deep learning for clinical decision support and suggest avenues for interpretability and extending to additional modalities.

Abstract

With the increasing availability of diverse data types, particularly images and time series data from medical experiments, there is a growing demand for techniques designed to combine various modalities of data effectively. Our motivation comes from the important areas of predicting mortality and phenotyping where using different modalities of data could significantly improve our ability to predict. To tackle this challenge, we introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information. Apart from the technical challenges, our goal is to make the predictive model more robust in noisy conditions and perform better than current methods. We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results while simultaneously providing a principled means of modeling uncertainty. Additionally, we include attention mechanisms to fuse different modalities, allowing the model to focus on what's important for each task. We tested our approach using the comprehensive multimodal MIMIC dataset, combining MIMIC-IV and MIMIC-CXR datasets. Our experiments show that our method is effective in improving multimodal deep learning for clinical applications. The code will be made available online.

Towards Precision Healthcare: Robust Fusion of Time Series and Image Data

TL;DR

This work tackles mortality prediction and phenotyping from multimodal clinical data by fusing time-series and chest X-ray information. It introduces a dual-branch architecture with modality-specific encoders (ResNet-34 for images and a 2-layer LSTM for time-series) whose embeddings are projected and fused via a Transformer encoder, using no positional embeddings to avoid modality bias. A Kendall-style multi-task uncertainty loss weights the 25 phenotyping tasks, enabling robust multi-label learning and improved calibration, while CLAHE augmentation and attention fusion boost performance and resilience to noise. Empirical results on the MIMIC-IV and MIMIC-CXR dataset demonstrate state-of-the-art performance for mortality prediction and phenotyping, with notable robustness in noisy settings and improved uncertainty handling. These findings advance robust multimodal deep learning for clinical decision support and suggest avenues for interpretability and extending to additional modalities.

Abstract

With the increasing availability of diverse data types, particularly images and time series data from medical experiments, there is a growing demand for techniques designed to combine various modalities of data effectively. Our motivation comes from the important areas of predicting mortality and phenotyping where using different modalities of data could significantly improve our ability to predict. To tackle this challenge, we introduce a new method that uses two separate encoders, one for each type of data, allowing the model to understand complex patterns in both visual and time-based information. Apart from the technical challenges, our goal is to make the predictive model more robust in noisy conditions and perform better than current methods. We also deal with imbalanced datasets and use an uncertainty loss function, yielding improved results while simultaneously providing a principled means of modeling uncertainty. Additionally, we include attention mechanisms to fuse different modalities, allowing the model to focus on what's important for each task. We tested our approach using the comprehensive multimodal MIMIC dataset, combining MIMIC-IV and MIMIC-CXR datasets. Our experiments show that our method is effective in improving multimodal deep learning for clinical applications. The code will be made available online.
Paper Structure (23 sections, 3 equations, 4 figures, 6 tables)

This paper contains 23 sections, 3 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: The effect of CLAHE augmentation: (a) original chest X-ray image, (b) CLAHE augmented image. CLAHE significantly improves the visibility of inner body parts, showcasing intricate details such as the kidney on the right side of the image. Additionally, it enhances the depiction of bone density, providing clearer insights. Such refined details play a crucial role in mortality prediction and phenotype classification.
  • Figure 2: Model architecture consisting of modality-specific encoders and a multilayer transformer encoder as our multimodal fusion network.
  • Figure 3: We combine and weigh multiple losses according to the uncertainty of each task to compute the multi-task uncertainty loss.
  • Figure 4: Performance comparison of models trained on noisy or noise-free datasets, and evaluated on noisy datasets. The plot employs different colors to represent specific configurations: blue indicates the attention-based fusion model trained on noisy data; red shows the MedFuse model trained on noisy data; green denotes the attention-based fusion trained on noise-free data; and orange represents the MedFuse model trained on noise-free data. As noise levels increase, a general decline in performance is observed for all models across various metrics. Notably, the use of attention mechanisms appears to mitigate performance degradation, showcasing enhanced robustness against noise.