Table of Contents
Fetching ...

A Self-Supervised Framework for Improved Generalisability in Ultrasound B-mode Image Segmentation

Edward Ellis, Andrew Bulpitt, Nasim Parsa, Michael F Byrne, Sharib Ali

TL;DR

This work tackles the challenge of generalisable US B-mode image segmentation with limited labeled data by proposing a domain-informed self-supervised framework. It introduces a Cross-Patch Jigsaw pretext task augmented by frequency-domain band-stop filtering and a learnable Relation Contrastive Loss (RCL) guided by perceptual loss. Across BUSI, BrEaST, and UDIAT datasets, the approach yields consistent gains over supervised baselines, particularly under data scarcity, and demonstrates improved generalisability to out-of-distribution data. The findings underscore the value of domain-specific SSL augmentations and metric learning for robust US segmentation, with practical implications for broader clinical deployment. The work suggests further extension to other US domains, including abdominal imaging, to broaden clinical impact.

Abstract

Ultrasound (US) imaging is clinically invaluable due to its noninvasive and safe nature. However, interpreting US images is challenging, requires significant expertise, and time, and is often prone to errors. Deep learning offers assistive solutions such as segmentation. Supervised methods rely on large, high-quality, and consistently labeled datasets, which are challenging to curate. Moreover, these methods tend to underperform on out-of-distribution data, limiting their clinical utility. Self-supervised learning (SSL) has emerged as a promising alternative, leveraging unlabeled data to enhance model performance and generalisability. We introduce a contrastive SSL approach tailored for B-mode US images, incorporating a novel Relation Contrastive Loss (RCL). RCL encourages learning of distinct features by differentiating positive and negative sample pairs through a learnable metric. Additionally, we propose spatial and frequency-based augmentation strategies for the representation learning on US images. Our approach significantly outperforms traditional supervised segmentation methods across three public breast US datasets, particularly in data-limited scenarios. Notable improvements on the Dice similarity metric include a 4% increase on 20% and 50% of the BUSI dataset, nearly 6% and 9% improvements on 20% and 50% of the BrEaST dataset, and 6.4% and 3.7% improvements on 20% and 50% of the UDIAT dataset, respectively. Furthermore, we demonstrate superior generalisability on the out-of-distribution UDIAT dataset with performance boosts of 20.6% and 13.6% compared to the supervised baseline using 20% and 50% of the BUSI and BrEaST training data, respectively. Our research highlights that domain-inspired SSL can improve US segmentation, especially under data-limited conditions.

A Self-Supervised Framework for Improved Generalisability in Ultrasound B-mode Image Segmentation

TL;DR

This work tackles the challenge of generalisable US B-mode image segmentation with limited labeled data by proposing a domain-informed self-supervised framework. It introduces a Cross-Patch Jigsaw pretext task augmented by frequency-domain band-stop filtering and a learnable Relation Contrastive Loss (RCL) guided by perceptual loss. Across BUSI, BrEaST, and UDIAT datasets, the approach yields consistent gains over supervised baselines, particularly under data scarcity, and demonstrates improved generalisability to out-of-distribution data. The findings underscore the value of domain-specific SSL augmentations and metric learning for robust US segmentation, with practical implications for broader clinical deployment. The work suggests further extension to other US domains, including abdominal imaging, to broaden clinical impact.

Abstract

Ultrasound (US) imaging is clinically invaluable due to its noninvasive and safe nature. However, interpreting US images is challenging, requires significant expertise, and time, and is often prone to errors. Deep learning offers assistive solutions such as segmentation. Supervised methods rely on large, high-quality, and consistently labeled datasets, which are challenging to curate. Moreover, these methods tend to underperform on out-of-distribution data, limiting their clinical utility. Self-supervised learning (SSL) has emerged as a promising alternative, leveraging unlabeled data to enhance model performance and generalisability. We introduce a contrastive SSL approach tailored for B-mode US images, incorporating a novel Relation Contrastive Loss (RCL). RCL encourages learning of distinct features by differentiating positive and negative sample pairs through a learnable metric. Additionally, we propose spatial and frequency-based augmentation strategies for the representation learning on US images. Our approach significantly outperforms traditional supervised segmentation methods across three public breast US datasets, particularly in data-limited scenarios. Notable improvements on the Dice similarity metric include a 4% increase on 20% and 50% of the BUSI dataset, nearly 6% and 9% improvements on 20% and 50% of the BrEaST dataset, and 6.4% and 3.7% improvements on 20% and 50% of the UDIAT dataset, respectively. Furthermore, we demonstrate superior generalisability on the out-of-distribution UDIAT dataset with performance boosts of 20.6% and 13.6% compared to the supervised baseline using 20% and 50% of the BUSI and BrEaST training data, respectively. Our research highlights that domain-inspired SSL can improve US segmentation, especially under data-limited conditions.

Paper Structure

This paper contains 27 sections, 13 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Block Diagram of our proposed SSL framework with domain-inspired data engineered pretext task that integrates perceptual loss ($\mathcal{L}_{perc.}$) and novel relation contrastive loss ($\mathcal{L}_{RCL}$). A novel data engineering strategy with frequency augmentation and a proposed US data-specific Cross-patch Jigsaw is applied. An ImageNet j_deng_imagenet_2009 pre-trained ResNet50 encoder network k_he_deep_2016 is used for our pretext task on both image-level ($I_{t_1}$) and patch-level ($I_{t^{p}_2}$). An initial representation of the images ($I^{0}_{t_1}$) from ResNet50 encoder is saved in the memory bank $\mathcal{M}$z_wu_unsupervised_2018. A projection network, function $f(.)$ and $g(.)$, is applied to convert the feature dimension to a $128-d$ vector. Feature embedding from patch images $I^p_{t_2}$ are concatenated. $\mathcal{L}_{RCL}$ is computed from the scores of positive, $s^{+}$ (similar) and negative, $s^{-}$ (dissimilar) samples with subscript $p$ for patch-level.
  • Figure 2: Frequency-Based Filtering Augmentation: The first row shows the inverse DFT of the US image with applied filters, and the second row shows the corresponding frequency distributions. Filter settings: Original image (no filters), Filter 1 (band-stop 20-30, X-shaped filter thickness 2), Filter 2 (band-stop 15-40, X-shaped filter thickness 5), Filter 3 (band-stop 12-50, X-shaped filter thickness 8).
  • Figure 3: Cross-patch Jigsaw Task: From left to right: Image1: Cropped frequency augmented image split into patches shown in red. Image2: Random patch selected in blue with focal patches outlined in blue and non-focal patches in red. Image3: Transformed focal and non-focal patches, focal patch area outlined in blue.
  • Figure 4: Qualitative evaluation of generalisability study on held out UDIAT dataset. 5 examples were chosen from small regular-shaped lesions to larger irregular-shaped lesions. Segmentation predictions across all 17 methods are presented, using 100% and 50% training samples. Yellow indicates ground truth labels overlayed onto the original image. Green indicates areas of under-segmentation and red indicates areas of over-segmentation relative to the ground truth.