Table of Contents
Fetching ...

Application of Contrastive Learning on ECG Data: Evaluating Performance in Japanese and Classification with Around 100 Labels

Junichiro Takahashi, JingChuan Guan, Masataka Sato, Kaito Baba, Kazuto Haruguchi, Daichi Nagashima, Satoshi Kodera, Norihiko Takeda

TL;DR

This study addresses the need for scalable ECG interpretation across languages by aligning ECG signals with Japanese text labels using a frozen Japanese medical language model. It introduces a multimodal contrastive learning framework with a ResNet1d-18 ECG encoder and MedLlama3-JP-v2 text embeddings, trained with a symmetric loss $\mathcal{L} = \dfrac{l^{(e\to t)} + l^{(t\to e)}}{2}$ and temperature $\tau = 0.07$. On 37,285 real-world Japanese ECG records with 98 labels, the approach achieves competitive top-1 and top-5 accuracy and shows substantial zero-shot capability via Superset labels, approaching prior English-language results on Rhythm and MIT-BIH tasks. The results support practical deployment in non-English clinical settings and highlight future directions, such as incorporating echocardiography data and wearable sensing for broader diagnostic support.

Abstract

The electrocardiogram (ECG) is a fundamental tool in cardiovascular diagnostics due to its powerful and non-invasive nature. One of the most critical usages is to determine whether more detailed examinations are necessary, with users ranging across various levels of expertise. Given this diversity in expertise, it is essential to assist users to avoid critical errors. Recent studies in machine learning have addressed this challenge by extracting valuable information from ECG data. Utilizing language models, these studies have implemented multimodal models aimed at classifying ECGs according to labeled terms. However, the number of classes was reduced, and it remains uncertain whether the technique is effective for languages other than English. To move towards practical application, we utilized ECG data from regular patients visiting hospitals in Japan, maintaining a large number of Japanese labels obtained from actual ECG readings. Using a contrastive learning framework, we found that even with 98 labels for classification, our Japanese-based language model achieves accuracy comparable to previous research. This study extends the applicability of multimodal machine learning frameworks to broader clinical studies and non-English languages.

Application of Contrastive Learning on ECG Data: Evaluating Performance in Japanese and Classification with Around 100 Labels

TL;DR

This study addresses the need for scalable ECG interpretation across languages by aligning ECG signals with Japanese text labels using a frozen Japanese medical language model. It introduces a multimodal contrastive learning framework with a ResNet1d-18 ECG encoder and MedLlama3-JP-v2 text embeddings, trained with a symmetric loss and temperature . On 37,285 real-world Japanese ECG records with 98 labels, the approach achieves competitive top-1 and top-5 accuracy and shows substantial zero-shot capability via Superset labels, approaching prior English-language results on Rhythm and MIT-BIH tasks. The results support practical deployment in non-English clinical settings and highlight future directions, such as incorporating echocardiography data and wearable sensing for broader diagnostic support.

Abstract

The electrocardiogram (ECG) is a fundamental tool in cardiovascular diagnostics due to its powerful and non-invasive nature. One of the most critical usages is to determine whether more detailed examinations are necessary, with users ranging across various levels of expertise. Given this diversity in expertise, it is essential to assist users to avoid critical errors. Recent studies in machine learning have addressed this challenge by extracting valuable information from ECG data. Utilizing language models, these studies have implemented multimodal models aimed at classifying ECGs according to labeled terms. However, the number of classes was reduced, and it remains uncertain whether the technique is effective for languages other than English. To move towards practical application, we utilized ECG data from regular patients visiting hospitals in Japan, maintaining a large number of Japanese labels obtained from actual ECG readings. Using a contrastive learning framework, we found that even with 98 labels for classification, our Japanese-based language model achieves accuracy comparable to previous research. This study extends the applicability of multimodal machine learning frameworks to broader clinical studies and non-English languages.

Paper Structure

This paper contains 11 sections, 4 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: The overall schematics of our model. The encoder of MedLlama3-JP-v2text MedLLama3-JP-v2 is employed as the frozen lanuguage model. ResNet1d-18 he2016deep is adopted as the ECG encoder. The text embeddings and ECG embeddings are denoted as $\textbf{t}_i$ and $\textbf{e}_i$, respectively.
  • Figure 2: The examples of the outputs of diagnosis predictions