Table of Contents
Fetching ...

CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis

Fuying Wang, Feng Wu, Yihan Tang, Lequan Yu

TL;DR

The paper addresses predicting clinical outcomes from multimodal EHR data by learning cross-modal temporal patterns that span both structured time-series and free-text notes. It introduces CTPD, which discovers shared temporal prototypes using multi-scale embeddings and slot-attentive refinement, aligns them across modalities with TP-NCE, and fuses timestamp-level and prototype representations via a transformer backbone. Empirical results on MIMIC-III show CTPD achieving state-of-the-art performance on 48-hour in-hospital mortality and 24-hour phenotype classification, with comprehensive ablations highlighting the contribution of cross-modal temporal patterns and auxiliary losses. The approach promises improved real-time decision support by effectively leveraging cross-modal temporal semantics while maintaining manageable model size and inference speed.

Abstract

Integrating multimodal Electronic Health Records (EHR) data, such as numerical time series and free-text clinical reports, has great potential in predicting clinical outcomes. However, prior work has primarily focused on capturing temporal interactions within individual samples and fusing multimodal information, overlooking critical temporal patterns across patients. These patterns, such as trends in vital signs like abnormal heart rate or blood pressure, can indicate deteriorating health or an impending critical event. Similarly, clinical notes often contain textual descriptions that reflect these patterns. Identifying corresponding temporal patterns across different modalities is crucial for improving the accuracy of clinical outcome predictions, yet it remains a challenging task. To address this gap, we introduce a Cross-Modal Temporal Pattern Discovery (CTPD) framework, designed to efficiently extract meaningful cross-modal temporal patterns from multimodal EHR data. Our approach introduces shared initial temporal pattern representations which are refined using slot attention to generate temporal semantic embeddings. To ensure rich cross-modal temporal semantics in the learned patterns, we introduce a contrastive-based TPNCE loss for cross-modal alignment, along with two reconstruction losses to retain core information of each modality. Evaluations on two clinically critical tasks, 48-hour in-hospital mortality and 24-hour phenotype classification, using the MIMIC-III database demonstrate the superiority of our method over existing approaches.

CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis

TL;DR

The paper addresses predicting clinical outcomes from multimodal EHR data by learning cross-modal temporal patterns that span both structured time-series and free-text notes. It introduces CTPD, which discovers shared temporal prototypes using multi-scale embeddings and slot-attentive refinement, aligns them across modalities with TP-NCE, and fuses timestamp-level and prototype representations via a transformer backbone. Empirical results on MIMIC-III show CTPD achieving state-of-the-art performance on 48-hour in-hospital mortality and 24-hour phenotype classification, with comprehensive ablations highlighting the contribution of cross-modal temporal patterns and auxiliary losses. The approach promises improved real-time decision support by effectively leveraging cross-modal temporal semantics while maintaining manageable model size and inference speed.

Abstract

Integrating multimodal Electronic Health Records (EHR) data, such as numerical time series and free-text clinical reports, has great potential in predicting clinical outcomes. However, prior work has primarily focused on capturing temporal interactions within individual samples and fusing multimodal information, overlooking critical temporal patterns across patients. These patterns, such as trends in vital signs like abnormal heart rate or blood pressure, can indicate deteriorating health or an impending critical event. Similarly, clinical notes often contain textual descriptions that reflect these patterns. Identifying corresponding temporal patterns across different modalities is crucial for improving the accuracy of clinical outcome predictions, yet it remains a challenging task. To address this gap, we introduce a Cross-Modal Temporal Pattern Discovery (CTPD) framework, designed to efficiently extract meaningful cross-modal temporal patterns from multimodal EHR data. Our approach introduces shared initial temporal pattern representations which are refined using slot attention to generate temporal semantic embeddings. To ensure rich cross-modal temporal semantics in the learned patterns, we introduce a contrastive-based TPNCE loss for cross-modal alignment, along with two reconstruction losses to retain core information of each modality. Evaluations on two clinically critical tasks, 48-hour in-hospital mortality and 24-hour phenotype classification, using the MIMIC-III database demonstrate the superiority of our method over existing approaches.

Paper Structure

This paper contains 30 sections, 10 equations, 4 figures, 10 tables.

Figures (4)

  • Figure 1: Motivation of our proposed CTPD: we visualized the time-series EHR with corresponding clinical notes in one ICU stay of the MIMIC-III dataset, and observed the temporal patterns across two modalities: Blue text highlights respiratory status. Oxygen requirements gradually decreased from 8L to 4L, and then to 2L nasal cannula, indicating steady respiratory improvement. Note that this pattern is also reflected from the time series. Green text captures cough progression and medication effects. Symptom relief was observed after administering Robitussin with codeine, demonstrating a delayed but positive response to treatment. Yellow text represents infection monitoring. The detection of Gram-positive cocci prompted blood culture collection (bld cx) for further evaluation, indicating active infection surveillance.
  • Figure 2: CTPD overview: the input Multivariate Irregular Time Series (MITS) and clinical note sequences are first encoded into regular embeddings. We then introduce the Cross-Modal Temporal Pattern Discovery (CTPD) module to extract meaningful temporal semantics. The extracted temporal patterns, along with the timestamp-level embeddings from both modalities, are fused to generate the final predictions.
  • Figure 3: Ablation study on the number of prototypes.
  • Figure 4: Visualization of the learned prototypes in our CTPD framework. Here we select $5$ representative clinical variables to visualize the time series. 'CPR" denotes "Capillary Refill Rate", "DBP" denotes "Diastolic blood pressure", ', "HR" denotes "Heart Rate", "MBP" denotes "Mean blood pressure" and "OS" denotes "Oxygen saturation".