ECG-Soup: Harnessing Multi-Layer Synergy for ECG Foundation Models
Phu X. Nguyen, Huy Phan, Hieu Pham, Christos Chatzichristos, Bert Vandenberk, Maarten De Vos
TL;DR
ECG-Soup investigates how intermediate layers of pretrained 1-D Vision Transformers encode ECG information and how to fuse multi-layer representations for robust downstream classification. The authors introduce three cross-layer aggregation schemes (PPA, PMA, IPASTMEM) built on a Spatio-Temporal Masked Electrocardiogram Modeling backbone (STMEM) and provide theoretical insights into attention dynamics. Empirical results across multiple ECG datasets show that middle layers offer richer, more generalizable features, and PMA and IPASTMEM consistently outperform baselines in both in-distribution and out-of-distribution settings. The work highlights the practical impact of multi-layer representation fusion for ECG foundation models and points to future multimodal extensions for zero-shot learning in biomedical tasks.
Abstract
Transformer-based foundation models for Electrocardiograms (ECGs) have recently achieved impressive performance in many downstream applications.
