Table of Contents
Fetching ...

Benchmarking Self-Supervised Models for Cardiac Ultrasound View Classification

Youssef Megahed, Salma I. Megahed, Robin Ducharme, Inok Lee, Adrian D. C. Chan, Mark C. Walker, Steven Hawken

TL;DR

This work benchmarks two self-supervised pretraining strategies for cardiac ultrasound view classification on the CACTUS dataset: ultrasound-domain masked autoencoding (USF-MAE) and contrastive learning (MoCo v3). Under identical fine-tuning settings and stratified 5-fold cross-validation, USF-MAE achieves higher accuracy, F1-score, recall, and ROC-AUC than MoCo v3, with statistically significant gains in F1 (p=0.0048). The results suggest that ultrasound-specific MAE pretraining yields more transferable and discriminative representations for view discrimination, supporting the value of domain-aligned SSL in medical imaging. Publicly available pretrained weights enable downstream tasks such as congenital heart defect detection in real fetal echocardiography, highlighting the potential clinical impact of ultrasound foundation models.

Abstract

Reliable interpretation of cardiac ultrasound images is essential for accurate clinical diagnosis and assessment. Self-supervised learning has shown promise in medical imaging by leveraging large unlabelled datasets to learn meaningful representations. In this study, we evaluate and compare two self-supervised learning frameworks, USF-MAE, developed by our team, and MoCo v3, on the recently introduced CACTUS dataset (37,736 images) for automated simulated cardiac view (A4C, PL, PSAV, PSMV, Random, and SC) classification. Both models used 5-fold cross-validation, enabling robust assessment of generalization performance across multiple random splits. The CACTUS dataset provides expert-annotated cardiac ultrasound images with diverse views. We adopt an identical training protocol for both models to ensure a fair comparison. Both models are configured with a learning rate of 0.0001 and a weight decay of 0.01. For each fold, we record performance metrics including ROC-AUC, accuracy, F1-score, and recall. Our results indicate that USF-MAE consistently outperforms MoCo v3 across metrics. The average testing AUC for USF-MAE is 99.99% (+/-0.01% 95% CI), compared to 99.97% (+/-0.01%) for MoCo v3. USF-MAE achieves a mean testing accuracy of 99.33% (+/-0.18%), higher than the 98.99% (+/-0.28%) reported for MoCo v3. Similar trends are observed for the F1-score and recall, with improvements statistically significant across folds (paired t-test, p=0.0048 < 0.01). This proof-of-concept analysis suggests that USF-MAE learns more discriminative features for cardiac view classification than MoCo v3 when applied to this dataset. The enhanced performance across multiple metrics highlights the potential of USF-MAE for improving automated cardiac ultrasound classification.

Benchmarking Self-Supervised Models for Cardiac Ultrasound View Classification

TL;DR

This work benchmarks two self-supervised pretraining strategies for cardiac ultrasound view classification on the CACTUS dataset: ultrasound-domain masked autoencoding (USF-MAE) and contrastive learning (MoCo v3). Under identical fine-tuning settings and stratified 5-fold cross-validation, USF-MAE achieves higher accuracy, F1-score, recall, and ROC-AUC than MoCo v3, with statistically significant gains in F1 (p=0.0048). The results suggest that ultrasound-specific MAE pretraining yields more transferable and discriminative representations for view discrimination, supporting the value of domain-aligned SSL in medical imaging. Publicly available pretrained weights enable downstream tasks such as congenital heart defect detection in real fetal echocardiography, highlighting the potential clinical impact of ultrasound foundation models.

Abstract

Reliable interpretation of cardiac ultrasound images is essential for accurate clinical diagnosis and assessment. Self-supervised learning has shown promise in medical imaging by leveraging large unlabelled datasets to learn meaningful representations. In this study, we evaluate and compare two self-supervised learning frameworks, USF-MAE, developed by our team, and MoCo v3, on the recently introduced CACTUS dataset (37,736 images) for automated simulated cardiac view (A4C, PL, PSAV, PSMV, Random, and SC) classification. Both models used 5-fold cross-validation, enabling robust assessment of generalization performance across multiple random splits. The CACTUS dataset provides expert-annotated cardiac ultrasound images with diverse views. We adopt an identical training protocol for both models to ensure a fair comparison. Both models are configured with a learning rate of 0.0001 and a weight decay of 0.01. For each fold, we record performance metrics including ROC-AUC, accuracy, F1-score, and recall. Our results indicate that USF-MAE consistently outperforms MoCo v3 across metrics. The average testing AUC for USF-MAE is 99.99% (+/-0.01% 95% CI), compared to 99.97% (+/-0.01%) for MoCo v3. USF-MAE achieves a mean testing accuracy of 99.33% (+/-0.18%), higher than the 98.99% (+/-0.28%) reported for MoCo v3. Similar trends are observed for the F1-score and recall, with improvements statistically significant across folds (paired t-test, p=0.0048 < 0.01). This proof-of-concept analysis suggests that USF-MAE learns more discriminative features for cardiac view classification than MoCo v3 when applied to this dataset. The enhanced performance across multiple metrics highlights the potential of USF-MAE for improving automated cardiac ultrasound classification.
Paper Structure (11 sections, 3 figures, 3 tables)

This paper contains 11 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: A) Original ultrasound frame with overlay annotations. B) Cropped and cleaned, processed and inpainted image used for analysis.
  • Figure 2: Comparison of MoCo v3 and USF-MAE Pipelines.
  • Figure 3: Classification performance for cardiac view classification: A) normalized confusion matrix and B) per-class ROC curves showing near perfect discrimination across all classes.