Table of Contents
Fetching ...

ECGtizer: a fully automated digitizing and signal recovery pipeline for electrocardiograms

Alex Lence, Ahmad Fall, Samuel David Cohen, Federica Granese, Jean-Daniel Zucker, Joe-Elie Salem, Edi Prifti

TL;DR

ECGtizer tackles the challenge of turning historical paper ECGs into machine-readable data and recovering missing signal portions for AI-ready analysis. It combines automated lead detection, three pixel-based trace extraction methods, and a UNet-based signal-reconstruction module to yield complete 12-lead, 10-second ECG records. Across real-world (JOCOVID) and public (PTB-XL) datasets, ECGtizer demonstrates superior signal fidelity and feature preservation compared with state-of-the-art tools, and it enables competitive downstream tasks such as TdP-risk classification after retraining on digitized data. The approach significantly broadens access to historical ECG cohorts and supports AI-driven diagnostics, while remaining open source and adaptable to additional ECG formats in future work.

Abstract

Electrocardiograms (ECGs) are essential for diagnosing cardiac pathologies, yet traditional paper-based ECG storage poses significant challenges for automated analysis. This study introduces ECGtizer, an open-source, fully automated tool designed to digitize paper ECGs and recover signals lost during storage. ECGtizer facilitates automated analyses using modern AI methods. It employs automated lead detection, three pixel-based signal extraction algorithms, and a deep learning-based signal reconstruction module. We evaluated ECGtizer on two datasets: a real-life cohort from the COVID-19 pandemic (JOCOVID) and a publicly available dataset (PTB-XL). Performance was compared with two existing methods: the fully automated ECGminer and the semi-automated PaperECG, which requires human intervention. ECGtizer's performance was assessed in terms of signal recovery and the fidelity of clinically relevant feature measurement. Additionally, we tested these tools on a third dataset (GENEREPOL) for downstream AI tasks. Results show that ECGtizer outperforms existing tools, with its ECGtizerFrag algorithm delivering superior signal recovery. While PaperECG demonstrated better outcomes than ECGminer, it required human input. ECGtizer enhances the usability of historical ECG data and supports advanced AI-based diagnostic methods, making it a valuable addition to the field of AI in ECG analysis.

ECGtizer: a fully automated digitizing and signal recovery pipeline for electrocardiograms

TL;DR

ECGtizer tackles the challenge of turning historical paper ECGs into machine-readable data and recovering missing signal portions for AI-ready analysis. It combines automated lead detection, three pixel-based trace extraction methods, and a UNet-based signal-reconstruction module to yield complete 12-lead, 10-second ECG records. Across real-world (JOCOVID) and public (PTB-XL) datasets, ECGtizer demonstrates superior signal fidelity and feature preservation compared with state-of-the-art tools, and it enables competitive downstream tasks such as TdP-risk classification after retraining on digitized data. The approach significantly broadens access to historical ECG cohorts and supports AI-driven diagnostics, while remaining open source and adaptable to additional ECG formats in future work.

Abstract

Electrocardiograms (ECGs) are essential for diagnosing cardiac pathologies, yet traditional paper-based ECG storage poses significant challenges for automated analysis. This study introduces ECGtizer, an open-source, fully automated tool designed to digitize paper ECGs and recover signals lost during storage. ECGtizer facilitates automated analyses using modern AI methods. It employs automated lead detection, three pixel-based signal extraction algorithms, and a deep learning-based signal reconstruction module. We evaluated ECGtizer on two datasets: a real-life cohort from the COVID-19 pandemic (JOCOVID) and a publicly available dataset (PTB-XL). Performance was compared with two existing methods: the fully automated ECGminer and the semi-automated PaperECG, which requires human intervention. ECGtizer's performance was assessed in terms of signal recovery and the fidelity of clinically relevant feature measurement. Additionally, we tested these tools on a third dataset (GENEREPOL) for downstream AI tasks. Results show that ECGtizer outperforms existing tools, with its ECGtizerFrag algorithm delivering superior signal recovery. While PaperECG demonstrated better outcomes than ECGminer, it required human input. ECGtizer enhances the usability of historical ECG data and supports advanced AI-based diagnostic methods, making it a valuable addition to the field of AI in ECG analysis.

Paper Structure

This paper contains 31 sections, 6 figures, 7 tables, 3 algorithms.

Figures (6)

  • Figure 1: From left to right: In the first stage, Lead Trace Extraction, the ECG image is binarized using the Otsu thresholding method. Pixel variance is then calculated to assess intensity variation across columns and rows, enabling traces extraction. In the second stage, Lead Signal Extraction, ECGtizer analyzes each column of the extracted trace image matrix using one of three methods—full, fragmented, or lazy—to produce a 1D vector representing the lead's signal amplitude at each step.
  • Figure 2: Boxplot representation of digitization performance metrics (PCC, RMSE and SDTW) between the extracted signal and the real ECG signal before (original) and after the lead-completion phase (reconstructed). The evaluation of the reconstructed data is focused only on the recovered part of the signal.
  • Figure 3: An illustration of the digitization output from the three methods ECGtizer, ECGminer and PaperECG, before and after completion with ECGrecover. This example is extracted from the JOCOVID dataset and corresponds to the lead V1 of the patient n° 137.
  • Figure 4: Boxplot representation of the squared differences between the identified features in the digitized signal and those identified in the raw original signal. Some features are estimated at the time level (left panel) and others (right panel) at the amplitude level both before (original) and after (reconstructed) signal recovery. These differences are shown in a logarithmic scale.
  • Figure 5: Boxplot representation of digitization performance metrics (PCC, RMSE and SDTW) between the extracted signal and the real ECG signal before (original) and after the lead-completion phase (reconstructed). The evaluation of the reconstructed data is focused only on the recovered part of the signal.
  • ...and 1 more figures