Table of Contents
Fetching ...

Demographic-Aware Self-Supervised Anomaly Detection Pretraining for Equitable Rare Cardiac Diagnosis

Chaoqin Huang, Zi Zeng, Aofan Jiang, Yuchen Xu, Qing Cao, Kang Chen, Chenfei Chi, Yanfeng Wang, Ya Zhang

Abstract

Rare cardiac anomalies are difficult to detect from electrocardiograms (ECGs) due to their long-tailed distribution with extremely limited case counts and demographic disparities in diagnostic performance. These limitations contribute to delayed recognition and uneven quality of care, creating an urgent need for a generalizable framework that enhances sensitivity while ensuring equity across diverse populations. In this study, we developed an AI-assisted two-stage ECG framework integrating self-supervised anomaly detection with demographic-aware representation learning. The first stage performs self-supervised anomaly detection pretraining by reconstructing masked global and local ECG signals, modeling signal trends, and predicting patient attributes to learn robust ECG representations without diagnostic labels. The pretrained model is then fine-tuned for multi-label ECG classification using asymmetric loss to better handle long-tail cardiac abnormalities, and additionally produces anomaly score maps for localization, with CPU-based optimization enabling practical deployment. Evaluated on a longitudinal cohort of over one million clinical ECGs, our method achieves an AUROC of 94.7% for rare anomalies and reduces the common-rare performance gap by 73%, while maintaining consistent diagnostic accuracy across age and sex groups. In conclusion, the proposed equity-aware AI framework demonstrates strong clinical utility, interpretable anomaly localization, and scalable performance across multiple cohorts, highlighting its potential to mitigate diagnostic disparities and advance equitable anomaly detection in biomedical signals and digital health. Source code is available at https://github.com/MediaBrain-SJTU/Rare-ECG.

Demographic-Aware Self-Supervised Anomaly Detection Pretraining for Equitable Rare Cardiac Diagnosis

Abstract

Rare cardiac anomalies are difficult to detect from electrocardiograms (ECGs) due to their long-tailed distribution with extremely limited case counts and demographic disparities in diagnostic performance. These limitations contribute to delayed recognition and uneven quality of care, creating an urgent need for a generalizable framework that enhances sensitivity while ensuring equity across diverse populations. In this study, we developed an AI-assisted two-stage ECG framework integrating self-supervised anomaly detection with demographic-aware representation learning. The first stage performs self-supervised anomaly detection pretraining by reconstructing masked global and local ECG signals, modeling signal trends, and predicting patient attributes to learn robust ECG representations without diagnostic labels. The pretrained model is then fine-tuned for multi-label ECG classification using asymmetric loss to better handle long-tail cardiac abnormalities, and additionally produces anomaly score maps for localization, with CPU-based optimization enabling practical deployment. Evaluated on a longitudinal cohort of over one million clinical ECGs, our method achieves an AUROC of 94.7% for rare anomalies and reduces the common-rare performance gap by 73%, while maintaining consistent diagnostic accuracy across age and sex groups. In conclusion, the proposed equity-aware AI framework demonstrates strong clinical utility, interpretable anomaly localization, and scalable performance across multiple cohorts, highlighting its potential to mitigate diagnostic disparities and advance equitable anomaly detection in biomedical signals and digital health. Source code is available at https://github.com/MediaBrain-SJTU/Rare-ECG.
Paper Structure (29 sections, 7 equations, 7 figures, 10 tables)

This paper contains 29 sections, 7 equations, 7 figures, 10 tables.

Figures (7)

  • Figure 1: The proposed two-stage ECG diagnosis framework. Step 1: Self-supervised pretraining for ECG anomaly detection, training a model to detect abnormal patterns using global and local ECG features. Step 2: Fine-tuning the classifier based on the pretrained anomaly detection model for detailed diagnosis. This method enhances classification performance, particularly for less frequent cardiac conditions, by leveraging anomaly detection pretraining.
  • Figure 2: The details of the multi-scale cross-restoration framework for ECG anomaly detection pretraining.
  • Figure 3: The long-tailed (extremely imbalanced) distribution of cardiac disease types across the ECG-LT dataset, with cardiac disease types divided into common set, uncommon set, and rare set based on their frequency of occurrence. The red box highlights the expanded view of part of the Rare Set, where cardiac disease types occur fewer than 80 times.
  • Figure 4: Comprehensive analysis of the novel ECG-LT dataset. a. Hierarchical architecture of cardiac types. b. Comparison of the number of cardiac types in the ECG-LT dataset to those in existing ECG databases. c. Age distribution across the training, internal validation, and external validation sets. d. Gender distribution across the training, internal validation, and external validation sets.
  • Figure 5: Performance of ECG diagnosis. (a) Diagnosis performance on tail classes. Comparison of ECG diagnosis for each type in the Rare Set, where the proposed method outperforms, particularly for anomalies with fewer samples. (b) Diagnosis fairness across sex. The model shows consistent performance between male and female subjects, ensuring balanced accuracy. (c) Diagnosis fairness across age. Performance is evaluated across age groups in ten-year intervals, with the model demonstrating stable and equitable results. (d) Visualization of anomaly localization. Comparison of the proposed method (Ours) with a leading baseline (BeatGAN), against ground truth (pink boxes). Color-coded scores (0–1) indicate anomaly likelihood, with red highlighting the most probable locations.
  • ...and 2 more figures