ECG-FM: An Open Electrocardiogram Foundation Model
Kaden McKeen, Sameer Masood, Augustin Toma, Barry Rubin, Bo Wang
TL;DR
ECG-FM introduces an open-weight ECG foundation approach that leverages a hybrid wav2vec 2.0–CMSC self-supervised objective with Random Lead Masking to learn robust, transferable ECG representations from 1.5 million examples. It demonstrates strong data-efficiency and cross-dataset generalizability, outperforming task-specific baselines in small-to-medium data regimes and enabling reliable downstream tasks such as UHN-ECG interpretation and reduced LVEF prediction. The work provides a practical benchmark and open-source resources, including pretrained weights and code, to accelerate adoption and comparability in ECG foundation-model research. Independent evaluation on external data corroborates discriminative capability for acute coronary syndrome triage, and analyses show meaningful latent structure and interpretable attention patterns.
Abstract
Conventional task-specific electrocardiogram (ECG) analysis models require large annotated datasets to train. Foundation models mitigate this burden by leveraging self-supervised pretraining; however, the scarcity of open-weight ECG foundation models hinders adoption and cross-study comparability. We present ECG-FM, an open foundation model for ECG analysis, and conduct a study using a dataset of 1.5 million ECGs. ECG-FM is a transformer-based model pretrained using a hybrid contrastive and generative self-supervised learning approach. Our downstream tasks include predicting reduced left ventricular ejection fraction (LVEF) and ECG interpretation labels, where we release a benchmark task on the MIMIC-IV-ECG dataset. We affirm that ECG-FM is robust, label-efficient, and functionally discriminative by showcasing data scaling experiments, performing a latent space analysis, and generating saliency maps. ECG-FM markedly outperforms task-specific models in the small-to-medium-scale data regime and demonstrates cross-dataset generalizability, achieving high AUROC on many clinically salient labels such as atrial fibrillation (0.996) and LVEF<=40% (0.929). We release our code, model weights, and benchmark task at https://github.com/bowang-lab/ECG-FM/.
