Table of Contents
Fetching ...

Modally Reduced Representation Learning of Multi-Lead ECG Signals through Simultaneous Alignment and Reconstruction

Nabil Ibtehaz, Masood Mortazavi

TL;DR

The paper tackles the challenge of deriving task-general ECG representations under wearable constraints by introducing a modally reduced representation learning framework that aligns and reconstructs multi-lead signals in a unified embedding space. It trains twelve 1-D Masked AutoEncoder encoders (one per channel) with a joint objective combining reconstruction loss $L_{recon}$ and a triplet-based alignment loss $L_{align}$, governed by a curriculum: $Loss_{@epoch-i} = sin((i/N_{epochs})*(pi/2)) * L_{align} + cos((i/N_{epochs})*(pi/2)) * L_{recon}$. Pretraining occurs on a large PhysioNet 12-lead ECG corpus, with distributed training across channels. Results show highly correlated embeddings across channels, enabling partial 12-lead reconstruction from single-channel embeddings and improving downstream tasks such as myocardial infarction detection and ECG-ID biometric authentication (99.68% accuracy without finetuning), supporting the practical viability of wearable-friendly ECG features.

Abstract

Electrocardiogram (ECG) signals, profiling the electrical activities of the heart, are used for a plethora of diagnostic applications. However, ECG systems require multiple leads or channels of signals to capture the complete view of the cardiac system, which limits their application in smartwatches and wearables. In this work, we propose a modally reduced representation learning method for ECG signals that is capable of generating channel-agnostic, unified representations for ECG signals. Through joint optimization of reconstruction and alignment, we ensure that the embeddings of the different channels contain an amalgamation of the overall information across channels while also retaining their specific information. On an independent test dataset, we generated highly correlated channel embeddings from different ECG channels, leading to a moderate approximation of the 12-lead signals from a single-channel embedding. Our generated embeddings can work as competent features for ECG signals for downstream tasks.

Modally Reduced Representation Learning of Multi-Lead ECG Signals through Simultaneous Alignment and Reconstruction

TL;DR

The paper tackles the challenge of deriving task-general ECG representations under wearable constraints by introducing a modally reduced representation learning framework that aligns and reconstructs multi-lead signals in a unified embedding space. It trains twelve 1-D Masked AutoEncoder encoders (one per channel) with a joint objective combining reconstruction loss and a triplet-based alignment loss , governed by a curriculum: . Pretraining occurs on a large PhysioNet 12-lead ECG corpus, with distributed training across channels. Results show highly correlated embeddings across channels, enabling partial 12-lead reconstruction from single-channel embeddings and improving downstream tasks such as myocardial infarction detection and ECG-ID biometric authentication (99.68% accuracy without finetuning), supporting the practical viability of wearable-friendly ECG features.

Abstract

Electrocardiogram (ECG) signals, profiling the electrical activities of the heart, are used for a plethora of diagnostic applications. However, ECG systems require multiple leads or channels of signals to capture the complete view of the cardiac system, which limits their application in smartwatches and wearables. In this work, we propose a modally reduced representation learning method for ECG signals that is capable of generating channel-agnostic, unified representations for ECG signals. Through joint optimization of reconstruction and alignment, we ensure that the embeddings of the different channels contain an amalgamation of the overall information across channels while also retaining their specific information. On an independent test dataset, we generated highly correlated channel embeddings from different ECG channels, leading to a moderate approximation of the 12-lead signals from a single-channel embedding. Our generated embeddings can work as competent features for ECG signals for downstream tasks.
Paper Structure (7 sections, 13 equations, 1 figure, 1 algorithm)

This paper contains 7 sections, 13 equations, 1 figure, 1 algorithm.

Figures (1)

  • Figure 1: A) Our proposed architecture for correlating Masked AutoEncoders. (B) Signal (lower triangle) versus embedding similarity (upper triangle) for pairs of ECG channels on INCART and PTB datasets. (C) 2s ECG reconstruction of a sample from the test set: blue represents the original signal, while green and yellow correspond to reconstruction from the native versus channel 1 embedding, respectively. The red boxes denotes the masked windows. (D) Mean absolute error values of reconstructing each of the channel signals from different channels, computed on the test set. (E) t-SNE plot of the embeddings of the signals of different individuals from the ECG-ID dataset. (F) Improvement in the single-channel myocardial infarction diagnosis task on the PTB dataset.