A Dataset and Benchmarks for Atrial Fibrillation Detection from Electrocardiograms of Intensive Care Unit Patients
Sarah Nassar, Nooshin Maghsoodi, Sophia Mannina, Shamel Addas, Stephanie Sibley, Gabor Fichtinger, David Pichora, David Maslove, Purang Abolmaesumi, Parvin Mousavi
TL;DR
The paper addresses AF detection in ICU patients using ECG data by benchmarking three AI approaches—feature-based methods, deep learning, and ECG foundation models—across multiple training configurations and two datasets (Kingston ICU and 2021 PhysioNet CinC). It demonstrates that fine-tuning an ECG foundation model yields the best ICU performance (F1 up to 0.89), while large-scale DL models excel with abundant data and feature-based methods provide robustness with limited data. By publishing a labelled ICU dataset and a thorough benchmarking framework, the study enables future advances in AI-assisted ICU rhythm monitoring and AF forecasting. The work highlights practical implications for real-time monitoring and alarm management in critical care settings, as well as methodological insights for cross-site model validation and transfer learning.
Abstract
Objective: Atrial fibrillation (AF) is the most common cardiac arrhythmia experienced by intensive care unit (ICU) patients and can cause adverse health effects. In this study, we publish a labelled ICU dataset and benchmarks for AF detection. Methods: We compared machine learning models across three data-driven artificial intelligence (AI) approaches: feature-based classifiers, deep learning (DL), and ECG foundation models (FMs). This comparison addresses a critical gap in the literature and aims to pinpoint which AI approach is best for accurate AF detection. Electrocardiograms (ECGs) from a Canadian ICU and the 2021 PhysioNet/Computing in Cardiology Challenge were used to conduct the experiments. Multiple training configurations were tested, ranging from zero-shot inference to transfer learning. Results: On average and across both datasets, ECG FMs performed best, followed by DL, then feature-based classifiers. The model that achieved the top F1 score on our ICU test set was ECG-FM through a transfer learning strategy (F1=0.89). Conclusion: This study demonstrates promising potential for using AI to build an automatic patient monitoring system. Significance: By publishing our labelled ICU dataset (LinkToBeAdded) and performance benchmarks, this work enables the research community to continue advancing the state-of-the-art in AF detection in the ICU.
