Prediction of 30-day hospital readmission with clinical notes and EHR information
Tiago Almeida, Plinio Moreno, Catarina Barata
TL;DR
This paper addresses predicting 30-day hospital readmission by integrating structured EHR data and unstructured clinical notes. It proposes a scalable multimodal GraphSAGE-based GNN where each admission is a node, features come from demographic data, ICD embeddings for diagnoses/procedures, lab abnormalities, and BioClinicalBERT-derived note embeddings, with edges built via cosine similarity using FAISS. On the MIMIC-IV dataset, the model achieves AUROC of 0.727 and balanced accuracy of 0.667, outperforming logistic regression and MLP baselines, and illustrating the value of multimodal fusion. Compared to state-of-the-art multimodal readmission methods, the approach attains competitive performance on a larger-scale graph, while noting limitations in missing data handling and lack of temporal modeling.
Abstract
High hospital readmission rates are associated with significant costs and health risks for patients. Therefore, it is critical to develop predictive models that can support clinicians to determine whether or not a patient will return to the hospital in a relatively short period of time (e.g, 30-days). Nowadays, it is possible to collect both structured (electronic health records - EHR) and unstructured information (clinical notes) about a patient hospital event, all potentially containing relevant information for a predictive model. However, their integration is challenging. In this work we explore the combination of clinical notes and EHRs to predict 30-day hospital readmissions. We address the representation of the various types of information available in the EHR data, as well as exploring LLMs to characterize the clinical notes. We collect both information sources as the nodes of a graph neural network (GNN). Our model achieves an AUROC of 0.72 and a balanced accuracy of 66.7\%, highlighting the importance of combining the multimodal information.
