Leveraging Self-Supervised Learning for Fetal Cardiac Planes Classification using Ultrasound Scan Videos

Joseph Geo Benjamin; Mothilal Asokan; Amna Alhosani; Hussain Alasmawi; Werner Gerhard Diehl; Leanne Bricker; Karthik Nandakumar; Mohammad Yaqub

Leveraging Self-Supervised Learning for Fetal Cardiac Planes Classification using Ultrasound Scan Videos

Joseph Geo Benjamin, Mothilal Asokan, Amna Alhosani, Hussain Alasmawi, Werner Gerhard Diehl, Leanne Bricker, Karthik Nandakumar, Mohammad Yaqub

TL;DR

This paper assesses how self-supervised learning on unlabelled fetal ultrasound videos can enhance downstream Standard Fetal Cardiac Planes classification when labeled data is scarce. It benchmarks seven dual-encoder SSL methods (spanning reconstruction, contrastive, distillation, and information-theoretic objectives) using a ResNet-50 backbone to pretrain on US videos, then fine-tunes or linearly probes on limited 2D SFCP images. The study reveals that dataset variance drives generalization more than sheer size, with BarlowTwins providing the most robust transfer, and a striking gain of $12\%$ F1-score when using full fine-tuning with only $1\%$ labeled data compared to ImageNet initialisation (and at least $4\%$ over other SSL initialisations). These findings suggest a practical pathway for leveraging private US video data to improve fetal cardiac plane classification without requiring large annotated datasets. The results have meaningful implications for clinical deployment and underscore the value of SSL methods that decorrelate representations in medical video domains.

Abstract

Self-supervised learning (SSL) methods are popular since they can address situations with limited annotated data by directly utilising the underlying data distribution. However, the adoption of such methods is not explored enough in ultrasound (US) imaging, especially for fetal assessment. We investigate the potential of dual-encoder SSL in utilizing unlabelled US video data to improve the performance of challenging downstream Standard Fetal Cardiac Planes (SFCP) classification using limited labelled 2D US images. We study 7 SSL approaches based on reconstruction, contrastive loss, distillation, and information theory and evaluate them extensively on a large private US dataset. Our observations and findings are consolidated from more than 500 downstream training experiments under different settings. Our primary observation shows that for SSL training, the variance of the dataset is more crucial than its size because it allows the model to learn generalisable representations, which improve the performance of downstream tasks. Overall, the BarlowTwins method shows robust performance, irrespective of the training settings and data variations, when used as an initialisation for downstream tasks. Notably, full fine-tuning with 1% of labelled data outperforms ImageNet initialisation by 12% in F1-score and outperforms other SSL initialisations by at least 4% in F1-score, thus making it a promising candidate for transfer learning from US video to image data.

Leveraging Self-Supervised Learning for Fetal Cardiac Planes Classification using Ultrasound Scan Videos

TL;DR

F1-score when using full fine-tuning with only

labeled data compared to ImageNet initialisation (and at least

over other SSL initialisations). These findings suggest a practical pathway for leveraging private US video data to improve fetal cardiac plane classification without requiring large annotated datasets. The results have meaningful implications for clinical deployment and underscore the value of SSL methods that decorrelate representations in medical video domains.

Abstract

Paper Structure (11 sections, 4 figures, 1 table)

This paper contains 11 sections, 4 figures, 1 table.

Introduction
Related Work
Methodology
Data and Preprocessing
Self-Supervision Procedure
Classification Procedure
Results and Discussions
How do SSL pretrained models perform on different data sizes?
What is the effect of Random $vs.$ Imagenet initialisation during SSL training?
Does sampling more frames from Videos help improve SSL training?
Conclusion

Figures (4)

Figure 1: BarlowTwins performs consistently better even for challenging views. $*$ indicates Non-SSL initilisations.
Figure 2: Linear probing shows a different trend than full fine-tuning in random $vs.$ Imagenet initialisation for some SSL training.
Figure 3: Results show trade-off between data variance $vs.$ data size for SSL trainings.
Figure 4: Mean & SD obtained by training with $3$ different sampling of labelled data and seed values.

Leveraging Self-Supervised Learning for Fetal Cardiac Planes Classification using Ultrasound Scan Videos

TL;DR

Abstract

Leveraging Self-Supervised Learning for Fetal Cardiac Planes Classification using Ultrasound Scan Videos

Authors

TL;DR

Abstract

Table of Contents

Figures (4)