Table of Contents
Fetching ...

Enhanced Atrial Fibrillation Prediction in ESUS Patients with Hypergraph-based Pre-training

Yuzhang Xie, Yuhua Wu, Ruiyu Wang, Fadi Nahab, Xiao Hu, Carl Yang

Abstract

Atrial fibrillation (AF) is a major complication following embolic stroke of undetermined source (ESUS), elevating the risk of recurrent stroke and mortality. Early identification is clinically important, yet existing tools face limitations in accuracy, scalability, and cost. Machine learning (ML) offers promise but is hindered by small ESUS cohorts and high-dimensional medical features. To address these challenges, we introduce supervised and unsupervised hypergraph-based pre-training strategies to improve AF prediction in ESUS patients. We first pre-train hypergraph-based patient embedding models on a large stroke cohort (7,780 patients) to capture salient features and higher-order interactions. The resulting embeddings are transferred to a smaller ESUS cohort (510 patients), reducing feature dimensionality while preserving clinically meaningful information, enabling effective prediction with lightweight models. Experiments show that both pre-training approaches outperform traditional models trained on raw data, improving accuracy and robustness. This framework offers a scalable and efficient solution for AF risk prediction after stroke.

Enhanced Atrial Fibrillation Prediction in ESUS Patients with Hypergraph-based Pre-training

Abstract

Atrial fibrillation (AF) is a major complication following embolic stroke of undetermined source (ESUS), elevating the risk of recurrent stroke and mortality. Early identification is clinically important, yet existing tools face limitations in accuracy, scalability, and cost. Machine learning (ML) offers promise but is hindered by small ESUS cohorts and high-dimensional medical features. To address these challenges, we introduce supervised and unsupervised hypergraph-based pre-training strategies to improve AF prediction in ESUS patients. We first pre-train hypergraph-based patient embedding models on a large stroke cohort (7,780 patients) to capture salient features and higher-order interactions. The resulting embeddings are transferred to a smaller ESUS cohort (510 patients), reducing feature dimensionality while preserving clinically meaningful information, enabling effective prediction with lightweight models. Experiments show that both pre-training approaches outperform traditional models trained on raw data, improving accuracy and robustness. This framework offers a scalable and efficient solution for AF risk prediction after stroke.
Paper Structure (7 sections, 19 equations, 4 figures, 1 table)

This paper contains 7 sections, 19 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of our methodology. ICD codes from ESUS patients are used to generate patient representations through three approaches: From Scratch (Empty Embedding), Supervised Pre-training, and Unsupervised Pre-training using a large ischemic stroke cohort. These embeddings are combined with expert-derived features (ECG, demographics, and lab biomarkers) and used to train machine learning classifiers (logistic regression, random forest, and gradient boosting) to predict post-stroke atrial fibrillation (AF) versus non-AF outcomes.
  • Figure 2: Hypergraph structure and message passing. The hypergraph represents each patient visit as a hyperedge linking multiple diagnostic features, while features serve as nodes shared across visits. During bi-directional message passing, nodes aggregate information from their connected hyperedges, and hyperedges update their embeddings from incident nodes. This allows the model to capture higher-order relationships among diagnostic features and visits.
  • Figure 3: AF prediction performance comparison of different methods under varying training data sizes. The figure reports AUROC across four cross-validation settings (20–20, 40–20, 60–20, and 80–20), where the first number denotes the proportion of data used for training and the second denotes the fixed 20% test set.
  • Figure 4: AF prediction performance comparison of different methods on the external validation dataset.