Table of Contents
Fetching ...

PT: A Plain Transformer is Good Hospital Readmission Predictor

Zhenyi Fan, Jiaqi Li, Dongyu Luo, Yuqi Yuan

TL;DR

This work addresses predicting 30-day hospital readmission by proposing PT, a Transformer-based model that fuses EHR, chest radiographs, and clinical notes. Each modality is processed by dedicated Transformer blocks, with modality-specific feature extraction (Random Forest for EHR, MoCo-CXR for images, and TF-IDF for notes) and a final MLP head, enabling robust multimodal integration even when temporal data are incomplete. The model is enhanced with Random Forest feature selection, dynamic noise during training, and a K-fold ensemble, and trained with a Label Smoothing Focal Loss to handle class imbalance. Experiments on MIMIC-derived data demonstrate superior AUC performance over LSTM/GRU baselines and show favorable results across modality combinations, highlighting PT’s simplicity, scalability, and robustness. The approach offers practical benefits for targeted post-discharge interventions and efficient allocation of healthcare resources by improving early risk stratification.

Abstract

Hospital readmission prediction is critical for clinical decision support, aiming to identify patients at risk of returning within 30 days post-discharge. High readmission rates often indicate inadequate treatment or post-discharge care, making effective prediction models essential for optimizing resources and improving patient outcomes. We propose PT, a Transformer-based model that integrates Electronic Health Records (EHR), medical images, and clinical notes to predict 30-day all-cause hospital readmissions. PT extracts features from raw data and uses specialized Transformer blocks tailored to the data's complexity. Enhanced with Random Forest for EHR feature selection and test-time ensemble techniques, PT achieves superior accuracy, scalability, and robustness. It performs well even when temporal information is missing. Our main contributions are: (1)Simplicity: A powerful and efficient baseline model outperforming existing ones in prediction accuracy; (2)Scalability: Flexible handling of various features from different modalities, achieving high performance with just clinical notes or EHR data; (3)Robustness: Strong predictive performance even with missing or unclear temporal data.

PT: A Plain Transformer is Good Hospital Readmission Predictor

TL;DR

This work addresses predicting 30-day hospital readmission by proposing PT, a Transformer-based model that fuses EHR, chest radiographs, and clinical notes. Each modality is processed by dedicated Transformer blocks, with modality-specific feature extraction (Random Forest for EHR, MoCo-CXR for images, and TF-IDF for notes) and a final MLP head, enabling robust multimodal integration even when temporal data are incomplete. The model is enhanced with Random Forest feature selection, dynamic noise during training, and a K-fold ensemble, and trained with a Label Smoothing Focal Loss to handle class imbalance. Experiments on MIMIC-derived data demonstrate superior AUC performance over LSTM/GRU baselines and show favorable results across modality combinations, highlighting PT’s simplicity, scalability, and robustness. The approach offers practical benefits for targeted post-discharge interventions and efficient allocation of healthcare resources by improving early risk stratification.

Abstract

Hospital readmission prediction is critical for clinical decision support, aiming to identify patients at risk of returning within 30 days post-discharge. High readmission rates often indicate inadequate treatment or post-discharge care, making effective prediction models essential for optimizing resources and improving patient outcomes. We propose PT, a Transformer-based model that integrates Electronic Health Records (EHR), medical images, and clinical notes to predict 30-day all-cause hospital readmissions. PT extracts features from raw data and uses specialized Transformer blocks tailored to the data's complexity. Enhanced with Random Forest for EHR feature selection and test-time ensemble techniques, PT achieves superior accuracy, scalability, and robustness. It performs well even when temporal information is missing. Our main contributions are: (1)Simplicity: A powerful and efficient baseline model outperforming existing ones in prediction accuracy; (2)Scalability: Flexible handling of various features from different modalities, achieving high performance with just clinical notes or EHR data; (3)Robustness: Strong predictive performance even with missing or unclear temporal data.

Paper Structure

This paper contains 24 sections, 20 equations, 2 figures, 7 tables, 1 algorithm.

Figures (2)

  • Figure 1: Overview of the Plain Transformer structure. Features from EHR modality follow previous preprocessing pipeline tang2022multimodal.Image features for embedding are extracted with MoCo-CXR sowrirajan2021moco. Clinical notes features are extracted and processed with TF-IDF text feature extractor qaiser2018textramos2003using. Resulting features from the three modalities are fed into the transformer blocks for the final readmission prediction with each admission.
  • Figure 2: Left: AUC performance of the random forest model with different numbers of selected EHR features; Right: AUC performance of the model with different k values in K-fold cross-validation.