Contrastive Learning for Multi Label ECG Classification with Jaccard Score Based Sigmoid Loss
Junichiro Takahashi, Masataka Sato, Satoshi Kodeta, Norihiko Takeda
TL;DR
The paper tackles multi-label ECG interpretation by adapting a CLIP-style framework (SigLIP) to ECG data, introducing a Jaccard similarity-based sigmoid loss to handle co-occurring findings. It uses a 1D-ResNet-18 ECG encoder paired with a domain-aware language model (Qwen-3-8B) and evaluates on a large real-world hospital dataset, with comparisons to a language-model variant lacking ECG knowledge. The authors demonstrate that the modified loss, together with embedding dimensionality upscaling to 256 and random cropping to mitigate data drift, improves micro F1 to ~0.503 and Jaccard to ~0.350, with certain labels like lowEF achieving high AUC (~0.887). Cross-institution testing shows only modest degradation, underscoring robustness, and an ablation against a ResNet-based baseline confirms the superiority of the SigLIP-based approach for multi-label ECG classification. Overall, the work provides a clinically relevant foundation for multimodal AI that leverages ECG data to infer multiple findings and potentially surrogate echocardiographic information.
Abstract
Recent advances in large language models (LLMs) have enabled the development of multimodal medical AI. While models such as MedGemini achieve high accuracy on VQA tasks like USMLE MM, their performance on ECG based tasks remains limited, and some models, such as MedGemma, do not support ECG data at all. Interpreting ECGs is inherently challenging, and diagnostic accuracy can vary depending on the interpreter's experience. Although echocardiography provides rich diagnostic information, it requires specialized equipment and personnel, limiting its availability. In this study, we focus on constructing a robust ECG encoder for multimodal pretraining using real world hospital data. We employ SigLIP, a CLIP based model with a sigmoid based loss function enabling multi label prediction, and introduce a modified loss function tailored to the multi label nature of ECG data. Experiments demonstrate that incorporating medical knowledge in the language model and applying the modified loss significantly improve multi label ECG classification. To further enhance performance, we increase the embedding dimensionality and apply random cropping to mitigate data drift. Finally, per label analysis reveals which ECG findings are easier or harder to predict. Our study provides a foundational framework for developing medical models that utilize ECG data.
