CURENet: Combining Unified Representations for Efficient Chronic Disease Prediction
Cong-Tinh Dao, Nguyen Minh Thao Phan, Jun-En Ding, Chenwei Wu, David Restrepo, Dongsheng Luo, Fanyi Zhao, Chun-Chieh Liao, Wen-Chih Peng, Chi-Te Wang, Pei-Fu Chen, Ling Chen, Xinglong Ju, Feng Liu, Fang-Ming Hung
TL;DR
CURENet addresses chronic-disease prediction by fusing unstructured clinical notes, textual lab results, and irregular time-series visits through a fine-tuned LLM and a Time Series Transformer. The two-stream architecture produces semantic and temporal embeddings that are jointly learned via an MLP, enabling robust multilabel predictions with significant gains over baselines on MIMIC-III and FEMH. The work demonstrates strong predictive performance, improved disease embedding separability, and case-based interpretability, while discussing ethical and deployment considerations. This multimodal, temporally aware approach advances clinical decision support by leveraging rich textual and longitudinal EHR signals for accurate, scalable chronic disease risk assessment.
Abstract
Electronic health records (EHRs) are designed to synthesize diverse data types, including unstructured clinical notes, structured lab tests, and time-series visit data. Physicians draw on these multimodal and temporal sources of EHR data to form a comprehensive view of a patient's health, which is crucial for informed therapeutic decision-making. Yet, most predictive models fail to fully capture the interactions, redundancies, and temporal patterns across multiple data modalities, often focusing on a single data type or overlooking these complexities. In this paper, we present CURENet, a multimodal model (Combining Unified Representations for Efficient chronic disease prediction) that integrates unstructured clinical notes, lab tests, and patients' time-series data by utilizing large language models (LLMs) for clinical text processing and textual lab tests, as well as transformer encoders for longitudinal sequential visits. CURENet has been capable of capturing the intricate interaction between different forms of clinical data and creating a more reliable predictive model for chronic illnesses. We evaluated CURENet using the public MIMIC-III and private FEMH datasets, where it achieved over 94\% accuracy in predicting the top 10 chronic conditions in a multi-label framework. Our findings highlight the potential of multimodal EHR integration to enhance clinical decision-making and improve patient outcomes.
