Table of Contents
Fetching ...

Transferring Clinical Knowledge into ECGs Representation

Jose Geraldo Fernandes, Luiz Facury de Souza, Pedro Robles Dutenhefner, Gisele L. Pappa, Wagner Meira

TL;DR

This work tackles the interpretability gap in ECG deep learning by introducing a three-stage training paradigm that transfers multimodal clinical knowledge from tabular data into a unimodal ECG encoder through a self-supervised joint-embedding objective. It further augments interpretability by training the ECG embedding to predict laboratory abnormalities, enabling physiologically grounded explanations without requiring multimodal data at inference. Evaluated on MIMIC-IV-ECG, the method outperforms a signal-only baseline in multi-label diagnosis and closes a substantial portion of the gap to fully multimodal models, while maintaining practical unimodal deployment. The approach advances trustworthy ECG classification and lays groundwork for future expansion to text data and more robust explanation mechanisms.

Abstract

Deep learning models have shown high accuracy in classifying electrocardiograms (ECGs), but their black box nature hinders clinical adoption due to a lack of trust and interpretability. To address this, we propose a novel three-stage training paradigm that transfers knowledge from multimodal clinical data (laboratory exams, vitals, biometrics) into a powerful, yet unimodal, ECG encoder. We employ a self-supervised, joint-embedding pre-training stage to create an ECG representation that is enriched with contextual clinical information, while only requiring the ECG signal at inference time. Furthermore, as an indirect way to explain the model's output we train it to also predict associated laboratory abnormalities directly from the ECG embedding. Evaluated on the MIMIC-IV-ECG dataset, our model outperforms a standard signal-only baseline in multi-label diagnosis classification and successfully bridges a substantial portion of the performance gap to a fully multimodal model that requires all data at inference. Our work demonstrates a practical and effective method for creating more accurate and trustworthy ECG classification models. By converting abstract predictions into physiologically grounded \emph{explanations}, our approach offers a promising path toward the safer integration of AI into clinical workflows.

Transferring Clinical Knowledge into ECGs Representation

TL;DR

This work tackles the interpretability gap in ECG deep learning by introducing a three-stage training paradigm that transfers multimodal clinical knowledge from tabular data into a unimodal ECG encoder through a self-supervised joint-embedding objective. It further augments interpretability by training the ECG embedding to predict laboratory abnormalities, enabling physiologically grounded explanations without requiring multimodal data at inference. Evaluated on MIMIC-IV-ECG, the method outperforms a signal-only baseline in multi-label diagnosis and closes a substantial portion of the gap to fully multimodal models, while maintaining practical unimodal deployment. The approach advances trustworthy ECG classification and lays groundwork for future expansion to text data and more robust explanation mechanisms.

Abstract

Deep learning models have shown high accuracy in classifying electrocardiograms (ECGs), but their black box nature hinders clinical adoption due to a lack of trust and interpretability. To address this, we propose a novel three-stage training paradigm that transfers knowledge from multimodal clinical data (laboratory exams, vitals, biometrics) into a powerful, yet unimodal, ECG encoder. We employ a self-supervised, joint-embedding pre-training stage to create an ECG representation that is enriched with contextual clinical information, while only requiring the ECG signal at inference time. Furthermore, as an indirect way to explain the model's output we train it to also predict associated laboratory abnormalities directly from the ECG embedding. Evaluated on the MIMIC-IV-ECG dataset, our model outperforms a standard signal-only baseline in multi-label diagnosis classification and successfully bridges a substantial portion of the performance gap to a fully multimodal model that requires all data at inference. Our work demonstrates a practical and effective method for creating more accurate and trustworthy ECG classification models. By converting abstract predictions into physiologically grounded \emph{explanations}, our approach offers a promising path toward the safer integration of AI into clinical workflows.

Paper Structure

This paper contains 18 sections, 3 equations, 1 figure, 1 table.

Figures (1)

  • Figure 1: Schematic of the proposed multimodal training architecture. Our method begins with a (i) joint-embedding pre-training stage, where an ECG encoder $\Phi_x$ learns to produce representations $H_x$ that are aligned, task $\mathcal{L}_{je}$, with embeddings from tabular clinical data $M$. Subsequently, this single, enriched encoder is finetuned for two downstream tasks: (ii) a primary multi-label diagnosis classification task $\mathcal{L}_{c}$; and, (iii) a secondary laboratory abnormality reconstruction-like task $\mathcal{L}_{r}$, which provides a mechanism for aiding decision making. Crucially, only the ECG signal $X$ is required at inference time.