Table of Contents
Fetching ...

Prototype Learning to Create Refined Interpretable Digital Phenotypes from ECGs

Sahil Sethi, David Chen, Michael C. Burkhart, Nipun Bhandari, Bashar Ramadan, Brett Beaulieu-Jones

TL;DR

This work investigates whether prototypes learned by an interpretable ECG model trained for multi-label classification capture transferable physiologic signatures that relate to real-world clinical phenotypes. By applying ProtoECGNet, trained on PTB-XL, to MIMIC-IV without retraining, the authors link individual prototypical waveform patterns to discharge diagnoses (phecodes) and NLP-derived concepts, demonstrating that prototypes yield stronger, more specific associations than broader class predictions. The study shows robust predictive performance for both cardiac and non-cardiac conditions (e.g., AUCs around 0.89–0.91 for AF and CHF) and reveals that intra-class heterogeneity in prototypes correlates with association strength, supporting the view of prototypes as clinically meaningful intermediate phenotypes. These findings highlight the potential of interpretable, prototype-based models to augment digital phenotyping from physiologic time-series data and to provide transferable, mechanistic insights beyond the original training task.

Abstract

Prototype-based neural networks offer interpretable predictions by comparing inputs to learned, representative signal patterns anchored in training data. While such models have shown promise in the classification of physiological data, it remains unclear whether their prototypes capture an underlying structure that aligns with broader clinical phenotypes. We use a prototype-based deep learning model trained for multi-label ECG classification using the PTB-XL dataset. Then without modification we performed inference on the MIMIC-IV clinical database. We assess whether individual prototypes, trained solely for classification, are associated with hospital discharge diagnoses in the form of phecodes in this external population. Individual prototypes demonstrate significantly stronger and more specific associations with clinical outcomes compared to the classifier's class predictions, NLP-extracted concepts, or broader prototype classes across all phecode categories. Prototype classes with mixed significance patterns exhibit significantly greater intra-class distances (p $<$ 0.0001), indicating the model learned to differentiate clinically meaningful variations within diagnostic categories. The prototypes achieve strong predictive performance across diverse conditions, with AUCs ranging from 0.89 for atrial fibrillation to 0.91 for heart failure, while also showing substantial signal for non-cardiac conditions such as sepsis and renal disease. These findings suggest that prototype-based models can support interpretable digital phenotyping from physiologic time-series data, providing transferable intermediate phenotypes that capture clinically meaningful physiologic signatures beyond their original training objectives.

Prototype Learning to Create Refined Interpretable Digital Phenotypes from ECGs

TL;DR

This work investigates whether prototypes learned by an interpretable ECG model trained for multi-label classification capture transferable physiologic signatures that relate to real-world clinical phenotypes. By applying ProtoECGNet, trained on PTB-XL, to MIMIC-IV without retraining, the authors link individual prototypical waveform patterns to discharge diagnoses (phecodes) and NLP-derived concepts, demonstrating that prototypes yield stronger, more specific associations than broader class predictions. The study shows robust predictive performance for both cardiac and non-cardiac conditions (e.g., AUCs around 0.89–0.91 for AF and CHF) and reveals that intra-class heterogeneity in prototypes correlates with association strength, supporting the view of prototypes as clinically meaningful intermediate phenotypes. These findings highlight the potential of interpretable, prototype-based models to augment digital phenotyping from physiologic time-series data and to provide transferable, mechanistic insights beyond the original training task.

Abstract

Prototype-based neural networks offer interpretable predictions by comparing inputs to learned, representative signal patterns anchored in training data. While such models have shown promise in the classification of physiological data, it remains unclear whether their prototypes capture an underlying structure that aligns with broader clinical phenotypes. We use a prototype-based deep learning model trained for multi-label ECG classification using the PTB-XL dataset. Then without modification we performed inference on the MIMIC-IV clinical database. We assess whether individual prototypes, trained solely for classification, are associated with hospital discharge diagnoses in the form of phecodes in this external population. Individual prototypes demonstrate significantly stronger and more specific associations with clinical outcomes compared to the classifier's class predictions, NLP-extracted concepts, or broader prototype classes across all phecode categories. Prototype classes with mixed significance patterns exhibit significantly greater intra-class distances (p 0.0001), indicating the model learned to differentiate clinically meaningful variations within diagnostic categories. The prototypes achieve strong predictive performance across diverse conditions, with AUCs ranging from 0.89 for atrial fibrillation to 0.91 for heart failure, while also showing substantial signal for non-cardiac conditions such as sepsis and renal disease. These findings suggest that prototype-based models can support interpretable digital phenotyping from physiologic time-series data, providing transferable intermediate phenotypes that capture clinically meaningful physiologic signatures beyond their original training objectives.

Paper Structure

This paper contains 18 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Overview of study approach. 1.) Prototype training: ProtoECGNetsethi_protoecgnet_2025 was trained on the PTB-XL dataset for multi-label ECG classification, with three branches (1D rhythm, 2D local morphology, 2D global) that each learned representative waveform prototypes. 2.) Inference: The pretrained model was applied without retraining to MIMIC-IV ECGs, computing similarity scores to identify the most representative prototypes for each recording. 3.) Phenotyping: Prototype activations and branch-level classes were associated with hospital discharge diagnoses (phecodes) and compared with NLP-extracted concepts, enabling statistical testing of whether prototypes capture clinically meaningful, transferable physiologic signatures.
  • Figure 2: Comparison of predicted label distributions between PTB-XL and MIMIC-IV-ECG. A.) Frequencies (percentage of samples with predicted probability $\geq$ 0.5) of the 15 most prevalent predicted labels in MIMIC-IV-ECG, compared across both datasets. B.) Percentage difference in predicted label prevalence between MIMIC-IV-ECG and PTB-XL. Positive differences indicate higher predicted prevalence in MIMIC-IV-ECG; negative differences indicate lower prevalence compared to PTB-XL. C.) Scatterplot comparing label frequencies in PTB-XL and MIMIC-IV-ECG, illustrating broad agreement in prevalence patterns despite dataset differences (e.g., ICU setting in MIMIC). Labels with an absolute percentage difference greater than 1.0% are annotated. The dashed line indicates equal prevalence across datasets. D.) Distribution of predicted probabilities for the 15 most prevalent labels in MIMIC-IV-ECG, shown for both PTB-XL and MIMIC-IV-ECG among samples where that label was predicted with probability $\geq$ 0.5.
  • Figure 3: Principal Component Analysis of the prototype vector embeddings for each class within the 3 branches. A.) Shows each of the 5 prototypes for the 1D rhythm branch, B.) the 7 prototypes for each class in the 2D global branch, and C.) centroids for the 18 prototypes per class in the 2D morphology branch. Only centroids are shown to reduce over-plotting and to allow for visualization. See Sethi et al. for the full list of class abbreviation definitions. sethi_protoecgnet_2025
  • Figure 4: (Left.) Comparison of odds ratios based on each class of available labels. Fusion Labels: ProtoECGNet final predictions, NLP Concept: extracted diagnosis related concepts from computer generated ECG reports, Prototype Class: Branch-level class predictions within ProtoECGNet, and Prototype ID: branch-level individual prototypes most similar to a particular ECG. All pairwise comparisons are significant (*** = $p$-value $<$ 0.001). Prototype ID provides the cleanest groupings for associations to phenotypes.
  • Figure 5: Comparison of odds ratio magnitude based on each set of available labels broken down across the 15 largest Phecode categories.