Table of Contents
Fetching ...

CODE-II: A large-scale dataset for artificial intelligence in ECG analysis

Petrus E. O. G. B. Abreu, Gabriela M. M. Paixão, Jiawei Li, Paulo R. Gomes, Peter W. Macfarlane, Ana C. S. Oliveira, Vinicius T. Carvalho, Thomas B. Schön, Antonio Luiz P. Ribeiro, Antônio H. Ribeiro

TL;DR

CODE-II addresses the need for large-scale, clinically meaningful ECG annotations by delivering 2.7M real-world 12-lead ECGs with 66 expert-defined codes, plus CODE-II-open and CODE-II-test benchmarks. A 1D ResNet‑style architecture trained on CODE-II demonstrates robust multilabel classification, with scaling laws showing substantial gains from data size and strong transfer to PTB-XL and CPSC 2018 in both full-data and few-shot regimes. Public benchmarking via CODE-II-open and independent CODE-II-test enables reproducibility and cross-population validation, while external evaluations attest to the transferability of learned representations beyond the originating telehealth system. The work highlights practical telecardiology implications, operational considerations, and the value of clinically grounded label spaces for scalable AI in ECG interpretation.

Abstract

Data-driven methods for electrocardiogram (ECG) interpretation are rapidly progressing. Large datasets have enabled advances in artificial intelligence (AI) based ECG analysis, yet limitations in annotation quality, size, and scope remain major challenges. Here we present CODE-II, a large-scale real-world dataset of 2,735,269 12-lead ECGs from 2,093,807 adult patients collected by the Telehealth Network of Minas Gerais (TNMG), Brazil. Each exam was annotated using standardized diagnostic criteria and reviewed by cardiologists. A defining feature of CODE-II is a set of 66 clinically meaningful diagnostic classes, developed with cardiologist input and routinely used in telehealth practice. We additionally provide an open available subset: CODE-II-open, a public subset of 15,000 patients, and the CODE-II-test, a non-overlapping set of 8,475 exams reviewed by multiple cardiologists for blinded evaluation. A neural network pre-trained on CODE-II achieved superior transfer performance on external benchmarks (PTB-XL and CPSC 2018) and outperformed alternatives trained on larger datasets.

CODE-II: A large-scale dataset for artificial intelligence in ECG analysis

TL;DR

CODE-II addresses the need for large-scale, clinically meaningful ECG annotations by delivering 2.7M real-world 12-lead ECGs with 66 expert-defined codes, plus CODE-II-open and CODE-II-test benchmarks. A 1D ResNet‑style architecture trained on CODE-II demonstrates robust multilabel classification, with scaling laws showing substantial gains from data size and strong transfer to PTB-XL and CPSC 2018 in both full-data and few-shot regimes. Public benchmarking via CODE-II-open and independent CODE-II-test enables reproducibility and cross-population validation, while external evaluations attest to the transferability of learned representations beyond the originating telehealth system. The work highlights practical telecardiology implications, operational considerations, and the value of clinically grounded label spaces for scalable AI in ECG interpretation.

Abstract

Data-driven methods for electrocardiogram (ECG) interpretation are rapidly progressing. Large datasets have enabled advances in artificial intelligence (AI) based ECG analysis, yet limitations in annotation quality, size, and scope remain major challenges. Here we present CODE-II, a large-scale real-world dataset of 2,735,269 12-lead ECGs from 2,093,807 adult patients collected by the Telehealth Network of Minas Gerais (TNMG), Brazil. Each exam was annotated using standardized diagnostic criteria and reviewed by cardiologists. A defining feature of CODE-II is a set of 66 clinically meaningful diagnostic classes, developed with cardiologist input and routinely used in telehealth practice. We additionally provide an open available subset: CODE-II-open, a public subset of 15,000 patients, and the CODE-II-test, a non-overlapping set of 8,475 exams reviewed by multiple cardiologists for blinded evaluation. A neural network pre-trained on CODE-II achieved superior transfer performance on external benchmarks (PTB-XL and CPSC 2018) and outperformed alternatives trained on larger datasets.

Paper Structure

This paper contains 43 sections, 38 figures, 15 tables.

Figures (38)

  • Figure 1: Map of Brazil illustrating the absolute and relative numbers of ECGs for each state served by TNMG in the CODE-II dataset.
  • Figure 1: Number of ECG exams per year in the CODE-II dataset.
  • Figure 2: Summary of diagnostic class frequencies and group combinations in CODE-II. (a) Number of ECG exams associated with each diagnostic class in the CODE-II dataset. Diagnostic classes are not mutually exclusive; multiple diagnoses may be assigned to a single exam, except for Normal ECGs, which are exclusive. (b) Number of exams per each CODE diagnostic group. Exams may belong to more than one group. (c) Distribution of the number of diagnostic classes assigned per exam. All panels use a logarithmic scale on the y-axis.
  • Figure 2: Age distribution of patients at their first ECG exam in the CODE-II dataset, stratified by sex.
  • Figure 3: Global performance of the model on the CODE-II-test dataset. Panels show (a) micro- and (b) macro-averaged results for AUROC, AUPRC, F1 score, Recall, Specificity, Precision, and NPV. Threshold-dependent metrics were computed after applying class-specific thresholds selected to maximize the F1-score. Bars represent the mean metric values, with numerical values shown to the right of each bar, and horizontal lines denoting 95% confidence intervals estimated from 1,000 bootstrap resamples.
  • ...and 33 more figures