Table of Contents
Fetching ...

VizECGNet: Visual ECG Image Network for Cardiovascular Diseases Classification with Multi-Modal Training and Knowledge Distillation

Ju-Hyeon Nam, Seo-Hyung Park, Su Jung Kim, Sang-Chul Lee

TL;DR

VizECGNet tackles the practical challenge of classifying cardiovascular diseases from printed ECG graphics when raw signal data are unavailable. It fuses time-series ECG signals and ECG images during training using cross-modal attention (CMAM) and self-modal attention (SMAM), and applies knowledge distillation to align the image and signal predictions so image-only inference remains viable, with the total loss $L_{total} = \lambda_{1} L_{cls} + \lambda_{2} L_{KD}$ and $\lambda_{1} = \lambda_{2} = 1$. The approach achieves state-of-the-art performance on a large-scale multi-label 12-lead ECG dataset, outperforming signal-, image-, and hybrid-based baselines in precision, recall, and macro-F1. This has practical impact for low-resource clinics by enabling accurate disease prognosis from printed ECGs, and the authors plan further validation across more datasets and real clinical settings.

Abstract

An electrocardiogram (ECG) captures the heart's electrical signal to assess various heart conditions. In practice, ECG data is stored as either digitized signals or printed images. Despite the emergence of numerous deep learning models for digitized signals, many hospitals prefer image storage due to cost considerations. Recognizing the unavailability of raw ECG signals in many clinical settings, we propose VizECGNet, which uses only printed ECG graphics to determine the prognosis of multiple cardiovascular diseases. During training, cross-modal attention modules (CMAM) are used to integrate information from two modalities - image and signal, while self-modality attention modules (SMAM) capture inherent long-range dependencies in ECG data of each modality. Additionally, we utilize knowledge distillation to improve the similarity between two distinct predictions from each modality stream. This innovative multi-modal deep learning architecture enables the utilization of only ECG images during inference. VizECGNet with image input achieves higher performance in precision, recall, and F1-Score compared to signal-based ECG classification models, with improvements of 3.50%, 8.21%, and 7.38%, respectively.

VizECGNet: Visual ECG Image Network for Cardiovascular Diseases Classification with Multi-Modal Training and Knowledge Distillation

TL;DR

VizECGNet tackles the practical challenge of classifying cardiovascular diseases from printed ECG graphics when raw signal data are unavailable. It fuses time-series ECG signals and ECG images during training using cross-modal attention (CMAM) and self-modal attention (SMAM), and applies knowledge distillation to align the image and signal predictions so image-only inference remains viable, with the total loss and . The approach achieves state-of-the-art performance on a large-scale multi-label 12-lead ECG dataset, outperforming signal-, image-, and hybrid-based baselines in precision, recall, and macro-F1. This has practical impact for low-resource clinics by enabling accurate disease prognosis from printed ECGs, and the authors plan further validation across more datasets and real clinical settings.

Abstract

An electrocardiogram (ECG) captures the heart's electrical signal to assess various heart conditions. In practice, ECG data is stored as either digitized signals or printed images. Despite the emergence of numerous deep learning models for digitized signals, many hospitals prefer image storage due to cost considerations. Recognizing the unavailability of raw ECG signals in many clinical settings, we propose VizECGNet, which uses only printed ECG graphics to determine the prognosis of multiple cardiovascular diseases. During training, cross-modal attention modules (CMAM) are used to integrate information from two modalities - image and signal, while self-modality attention modules (SMAM) capture inherent long-range dependencies in ECG data of each modality. Additionally, we utilize knowledge distillation to improve the similarity between two distinct predictions from each modality stream. This innovative multi-modal deep learning architecture enables the utilization of only ECG images during inference. VizECGNet with image input achieves higher performance in precision, recall, and F1-Score compared to signal-based ECG classification models, with improvements of 3.50%, 8.21%, and 7.38%, respectively.
Paper Structure (9 sections, 7 equations, 2 figures, 2 tables)

This paper contains 9 sections, 7 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Overall architecture of the proposed VizECGNet, which mainly comprises CMAM and SMAM. (a) Overall block diagram of out network. (b) Overview of CMAM. (c) Overview of SMAM.
  • Figure 2: The example of real ECG print image and prediction results of VizECGNet and Image-based models (ResNet18 and MobileNetV3). (a) http://www.hvt-journal.com/articles/art6. (b) https://www.researchgate.net/figure/9-A-12-lead-ECG-showing-RBBB-with-right-axis-deviation-and-positive-precordial_fig4_367943268. (c) https://www.shutterstock.com/search/left-bundle-branch. (d) https://www.istockphoto.com/kr/벡터/ecg-1도-방실-차단-1도-방실-차단-12리드-ecg-공통-사례-6초-리드-gm1484000224-510460572. (e) Prediction probability for each cardiovascular diseases. Red, Yellow, and Green bars denotes VizECGNet, ResNet18, and MobileNetV3, respectively.