Table of Contents
Fetching ...

Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation

Wenfang Yao, Chen Liu, Kejing Yin, William K. Cheung, Jing Qin

TL;DR

DDL-CXR is proposed, a method that dynamically generates an up-to-date latent representation of the individualized CXR images that could effectively address asynchronicity in multimodal fusion and consistently outperform existing methods.

Abstract

Integrating multi-modal clinical data, such as electronic health records (EHR) and chest X-ray images (CXR), is particularly beneficial for clinical prediction tasks. However, in a temporal setting, multi-modal data are often inherently asynchronous. EHR can be continuously collected but CXR is generally taken with a much longer interval due to its high cost and radiation dose. When clinical prediction is needed, the last available CXR image might have been outdated, leading to suboptimal predictions. To address this challenge, we propose DDL-CXR, a method that dynamically generates an up-to-date latent representation of the individualized CXR images. Our approach leverages latent diffusion models for patient-specific generation strategically conditioned on a previous CXR image and EHR time series, providing information regarding anatomical structures and disease progressions, respectively. In this way, the interaction across modalities could be better captured by the latent CXR generation process, ultimately improving the prediction performance. Experiments using MIMIC datasets show that the proposed model could effectively address asynchronicity in multimodal fusion and consistently outperform existing methods.

Addressing Asynchronicity in Clinical Multimodal Fusion via Individualized Chest X-ray Generation

TL;DR

DDL-CXR is proposed, a method that dynamically generates an up-to-date latent representation of the individualized CXR images that could effectively address asynchronicity in multimodal fusion and consistently outperform existing methods.

Abstract

Integrating multi-modal clinical data, such as electronic health records (EHR) and chest X-ray images (CXR), is particularly beneficial for clinical prediction tasks. However, in a temporal setting, multi-modal data are often inherently asynchronous. EHR can be continuously collected but CXR is generally taken with a much longer interval due to its high cost and radiation dose. When clinical prediction is needed, the last available CXR image might have been outdated, leading to suboptimal predictions. To address this challenge, we propose DDL-CXR, a method that dynamically generates an up-to-date latent representation of the individualized CXR images. Our approach leverages latent diffusion models for patient-specific generation strategically conditioned on a previous CXR image and EHR time series, providing information regarding anatomical structures and disease progressions, respectively. In this way, the interaction across modalities could be better captured by the latent CXR generation process, ultimately improving the prediction performance. Experiments using MIMIC datasets show that the proposed model could effectively address asynchronicity in multimodal fusion and consistently outperform existing methods.

Paper Structure

This paper contains 46 sections, 13 equations, 9 figures, 14 tables.

Figures (9)

  • Figure 1: A real ICU patient with rapid CXR changes. (a) Initial radiology findings: Low lung volumes but lungs are clear of consolidation or pulmonary vascular congestion. No acute cardiopulmonary process. (b) Radiology findings after 34 hours: Severe relatively symmetric bilateral pulmonary consolidation. (c) CXR generated by DDL-CXR given the initial CXR image shown in (a) and the EHR data within the 34 hours. Clear signs of bilateral pulmonary consolidation can be seen from the generated image. The visualization shows that DDL-CXR could generate updated CXR images that respect the anatomical structure of the patient and reflect the disease progression.
  • Figure 2: The overview of the proposed framework DDL-CXR. It consists of two stages. The LDM stage learns to generate an individualized up-to-date latent CXR at time $t_1$, $\hat{\mathbf{Z}}_{t_1}$, to address asynchronicity by conditioning on a previous CXR image taken at time $t_0$, $\mathbf{X}^{\text{CXR}}_{t_0}$, which provides the anatomical structure of the patient, as well as EHR data between $t_0$ and $t_1$, $\mathbf{X}^{\text{EHR}}_{(t_0, t_1)}$, that provides information on disease progression. A contrastive loss and auxiliary loss are enforced for better EHR information integration. The generation module encapsulates cross-modal interactions to assist in clinical prediction. The prediction stage fuses the generated latent CXR, the most recent CXR image, and the complete EHR time series for clinical predictions.
  • Figure 3: Examples of images generated by DDL-CXR. From top to bottom, the three rows are reference images $\mathbf{X}^{\text{CXR}}_{t_0}$, ground-truth images $\mathbf{X}^{\text{CXR}}_{t_1}$, and generated images $\widehat{\mathbf{X}}^{\text{CXR}}_{t_1}$, respectively. The generations show that DDL-CXR captures the anatomical information from $\mathbf{X}^{\text{CXR}}_{t_0}$ and the information of disease progression extracted from EHR is blended well towards generating $\mathbf{X}^{\text{CXR}}_{t_1}$.
  • Figure 4: Case Study of Sample #1
  • Figure 5: Case Study of Sample #2
  • ...and 4 more figures