PT: A Plain Transformer is Good Hospital Readmission Predictor
Zhenyi Fan, Jiaqi Li, Dongyu Luo, Yuqi Yuan
TL;DR
This work addresses predicting 30-day hospital readmission by proposing PT, a Transformer-based model that fuses EHR, chest radiographs, and clinical notes. Each modality is processed by dedicated Transformer blocks, with modality-specific feature extraction (Random Forest for EHR, MoCo-CXR for images, and TF-IDF for notes) and a final MLP head, enabling robust multimodal integration even when temporal data are incomplete. The model is enhanced with Random Forest feature selection, dynamic noise during training, and a K-fold ensemble, and trained with a Label Smoothing Focal Loss to handle class imbalance. Experiments on MIMIC-derived data demonstrate superior AUC performance over LSTM/GRU baselines and show favorable results across modality combinations, highlighting PT’s simplicity, scalability, and robustness. The approach offers practical benefits for targeted post-discharge interventions and efficient allocation of healthcare resources by improving early risk stratification.
Abstract
Hospital readmission prediction is critical for clinical decision support, aiming to identify patients at risk of returning within 30 days post-discharge. High readmission rates often indicate inadequate treatment or post-discharge care, making effective prediction models essential for optimizing resources and improving patient outcomes. We propose PT, a Transformer-based model that integrates Electronic Health Records (EHR), medical images, and clinical notes to predict 30-day all-cause hospital readmissions. PT extracts features from raw data and uses specialized Transformer blocks tailored to the data's complexity. Enhanced with Random Forest for EHR feature selection and test-time ensemble techniques, PT achieves superior accuracy, scalability, and robustness. It performs well even when temporal information is missing. Our main contributions are: (1)Simplicity: A powerful and efficient baseline model outperforming existing ones in prediction accuracy; (2)Scalability: Flexible handling of various features from different modalities, achieving high performance with just clinical notes or EHR data; (3)Robustness: Strong predictive performance even with missing or unclear temporal data.
