Modally Reduced Representation Learning of Multi-Lead ECG Signals through Simultaneous Alignment and Reconstruction
Nabil Ibtehaz, Masood Mortazavi
TL;DR
The paper tackles the challenge of deriving task-general ECG representations under wearable constraints by introducing a modally reduced representation learning framework that aligns and reconstructs multi-lead signals in a unified embedding space. It trains twelve 1-D Masked AutoEncoder encoders (one per channel) with a joint objective combining reconstruction loss $L_{recon}$ and a triplet-based alignment loss $L_{align}$, governed by a curriculum: $Loss_{@epoch-i} = sin((i/N_{epochs})*(pi/2)) * L_{align} + cos((i/N_{epochs})*(pi/2)) * L_{recon}$. Pretraining occurs on a large PhysioNet 12-lead ECG corpus, with distributed training across channels. Results show highly correlated embeddings across channels, enabling partial 12-lead reconstruction from single-channel embeddings and improving downstream tasks such as myocardial infarction detection and ECG-ID biometric authentication (99.68% accuracy without finetuning), supporting the practical viability of wearable-friendly ECG features.
Abstract
Electrocardiogram (ECG) signals, profiling the electrical activities of the heart, are used for a plethora of diagnostic applications. However, ECG systems require multiple leads or channels of signals to capture the complete view of the cardiac system, which limits their application in smartwatches and wearables. In this work, we propose a modally reduced representation learning method for ECG signals that is capable of generating channel-agnostic, unified representations for ECG signals. Through joint optimization of reconstruction and alignment, we ensure that the embeddings of the different channels contain an amalgamation of the overall information across channels while also retaining their specific information. On an independent test dataset, we generated highly correlated channel embeddings from different ECG channels, leading to a moderate approximation of the 12-lead signals from a single-channel embedding. Our generated embeddings can work as competent features for ECG signals for downstream tasks.
