Table of Contents
Fetching ...

ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning

Sahil Sethi, David Chen, Thomas Statchen, Michael C. Burkhart, Nipun Bhandari, Bashar Ramadan, Brett Beaulieu-Jones

TL;DR

ProtoECGNet introduces a self-explaining, prototype-based framework for multi-label ECG classification by deploying three specialized prototype branches that mirror clinical reasoning: rhythm (1D global prototypes), morphology (2D time-localized prototypes), and global abnormalities (2D global prototypes). A novel contrastive prototype loss, along with clustering, separation, and orthogonality terms, shapes the latent prototype space to reflect realistic label co-occurrence while maintaining discriminability. On PTB-XL’s 71-label benchmark, ProtoECGNet achieves competitive macro- and weighted-AUROC scores and provides faithful, case-based explanations validated by clinician ratings of prototype representativeness and clarity. The work demonstrates that prototype learning can scale to complex time-series, multi-label medical tasks and offers a practical path toward trustworthy AI-assisted clinical decision support through grounded, interpretable reasoning.

Abstract

Deep learning-based electrocardiogram (ECG) classification has shown impressive performance but clinical adoption has been slowed by the lack of transparent and faithful explanations. Post hoc methods such as saliency maps may fail to reflect a model's true decision process. Prototype-based reasoning offers a more transparent alternative by grounding decisions in similarity to learned representations of real ECG segments, enabling faithful, case-based explanations. We introduce ProtoECGNet, a prototype-based deep learning model for interpretable, multi-label ECG classification. ProtoECGNet employs a structured, multi-branch architecture that reflects clinical interpretation workflows: it integrates a 1D CNN with global prototypes for rhythm classification, a 2D CNN with time-localized prototypes for morphology-based reasoning, and a 2D CNN with global prototypes for diffuse abnormalities. Each branch is trained with a prototype loss designed for multi-label learning, combining clustering, separation, diversity, and a novel contrastive loss that encourages appropriate separation between prototypes of unrelated classes while allowing clustering for frequently co-occurring diagnoses. We evaluate ProtoECGNet on all 71 diagnostic labels from the PTB-XL dataset, demonstrating competitive performance relative to state-of-the-art black-box models while providing structured, case-based explanations. To assess prototype quality, we conduct a structured clinician review of the final model's projected prototypes, finding that they are rated as representative and clear. ProtoECGNet shows that prototype learning can be effectively scaled to complex, multi-label time-series classification, offering a practical path toward transparent and trustworthy deep learning models for clinical decision support.

ProtoECGNet: Case-Based Interpretable Deep Learning for Multi-Label ECG Classification with Contrastive Learning

TL;DR

ProtoECGNet introduces a self-explaining, prototype-based framework for multi-label ECG classification by deploying three specialized prototype branches that mirror clinical reasoning: rhythm (1D global prototypes), morphology (2D time-localized prototypes), and global abnormalities (2D global prototypes). A novel contrastive prototype loss, along with clustering, separation, and orthogonality terms, shapes the latent prototype space to reflect realistic label co-occurrence while maintaining discriminability. On PTB-XL’s 71-label benchmark, ProtoECGNet achieves competitive macro- and weighted-AUROC scores and provides faithful, case-based explanations validated by clinician ratings of prototype representativeness and clarity. The work demonstrates that prototype learning can scale to complex time-series, multi-label medical tasks and offers a practical path toward trustworthy AI-assisted clinical decision support through grounded, interpretable reasoning.

Abstract

Deep learning-based electrocardiogram (ECG) classification has shown impressive performance but clinical adoption has been slowed by the lack of transparent and faithful explanations. Post hoc methods such as saliency maps may fail to reflect a model's true decision process. Prototype-based reasoning offers a more transparent alternative by grounding decisions in similarity to learned representations of real ECG segments, enabling faithful, case-based explanations. We introduce ProtoECGNet, a prototype-based deep learning model for interpretable, multi-label ECG classification. ProtoECGNet employs a structured, multi-branch architecture that reflects clinical interpretation workflows: it integrates a 1D CNN with global prototypes for rhythm classification, a 2D CNN with time-localized prototypes for morphology-based reasoning, and a 2D CNN with global prototypes for diffuse abnormalities. Each branch is trained with a prototype loss designed for multi-label learning, combining clustering, separation, diversity, and a novel contrastive loss that encourages appropriate separation between prototypes of unrelated classes while allowing clustering for frequently co-occurring diagnoses. We evaluate ProtoECGNet on all 71 diagnostic labels from the PTB-XL dataset, demonstrating competitive performance relative to state-of-the-art black-box models while providing structured, case-based explanations. To assess prototype quality, we conduct a structured clinician review of the final model's projected prototypes, finding that they are rated as representative and clear. ProtoECGNet shows that prototype learning can be effectively scaled to complex, multi-label time-series classification, offering a practical path toward transparent and trustworthy deep learning models for clinical decision support.

Paper Structure

This paper contains 41 sections, 10 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Illustration of interpretability approaches for ECG classification. (A) Black-box models such as convolutional neural networks (CNNs) can achieve strong performance on diagnostic tasks, but provide no inherent explanation for their predictions. (B) Post hoc explainability methods, such as saliency maps, attempt to highlight input regions deemed important by the model after a prediction is made. However, these visualizations are not part of the model’s decision process and often fail to provide a meaningful explanation—simply indicating "where" the model looked does not explain "why" it made a decision. (C) Prototype-based models offer a self-explaining alternative: predictions are made by comparing a test input to a set of learned prototype vectors, each anchored to a real ECG segment. This enables case-based explanations that reflect the model’s actual classification logic. ASMI = anteroseptal myocardial infarction.
  • Figure 2: Multi-branch approach. See \ref{['fig:app_methods']} for detailed architectural information.
  • Figure 3: Case-based explanation for atrial flutter (AFLT) predicted by the fusion classifier. The model predicts AFLT for test ECG 449 based on high similarity to prototype 61, which was projected onto training ECG 10895. The top row displays the full 12-lead ECGs for both examples, with rhythm strips (lead II) highlighted in blue to guide interpretation. The bottom row provides a zoomed-in view of these rhythm strips.
  • Figure 4: Case-based explanation for anteroseptal myocardial infarction (ASMI) predicted by the fusion classifier. The model predicts ASMI for test ECG 3908, citing strong similarity to prototype 80, which was projected onto a latent patch from training ECG 17381. The top row shows the full 12-lead ECGs for both test and training examples, with the activated region highlighted in blue (5–5.9 seconds for the test ECG and 8.9–9.8 seconds for the prototype). The bottom row zooms into these regions to show all 12 leads. The model appears to have identified a match based on ST-segment elevations in anterior leads (e.g., V2–V4), with a high similarity score of 9.0872.
  • Figure 5: Case-based explanation for an electrolyte disturbance (EL) predicted by the fusion classifier. The model predicts EL for test ECG 12126, citing strong similarity to an EL prototype that was projected onto training ECG 12650. Since this diagnosis uses 2D global prototypes, full 12-lead ECGs are shown for both the test and training examples—along with their similarity score.
  • ...and 2 more figures