Table of Contents
Fetching ...

Multi-organ Self-supervised Contrastive Learning for Breast Lesion Segmentation

Hugo Figueiras, Helena Aidos, Nuno Cruz Garcia

TL;DR

This work investigates self-supervised, contrastive pre-training for breast lesion segmentation in ultrasound, examining MoCo, SimCLR, and SimSiam frameworks. By pre-training on multi-organ ultrasound data in addition to natural images, the authors study data-efficient downstream fine-tuning on a breast-focused dataset and assess the impact of encoder-only versus encoder-decoder pre-training. Across extensive experiments, SSL pre-training generally outperforms fully supervised baselines, with multi-organ pre-training offering the largest gains when more labels are available; benefits persist though can diminish at very low label fractions or with higher input resolutions. The findings suggest that leveraging diverse organ data during pre-training yields robust, generalizable representations that improve lesion segmentation in data-scarce medical settings, with practical implications for deploying SSL-based pipelines in clinical imaging tasks.

Abstract

Self-supervised learning has proven to be an effective way to learn representations in domains where annotated labels are scarce, such as medical imaging. A widely adopted framework for this purpose is contrastive learning and it has been applied to different scenarios. This paper seeks to advance our understanding of the contrastive learning framework by exploring a novel perspective: employing multi-organ datasets for pre-training models tailored to specific organ-related target tasks. More specifically, our target task is breast tumour segmentation in ultrasound images. The pre-training datasets include ultrasound images from other organs, such as the lungs and heart, and large datasets of natural images. Our results show that conventional contrastive learning pre-training improves performance compared to supervised baseline approaches. Furthermore, our pre-trained models achieve comparable performance when fine-tuned with only half of the available labelled data. Our findings also show the advantages of pre-training on diverse organ data for improving performance in the downstream task.

Multi-organ Self-supervised Contrastive Learning for Breast Lesion Segmentation

TL;DR

This work investigates self-supervised, contrastive pre-training for breast lesion segmentation in ultrasound, examining MoCo, SimCLR, and SimSiam frameworks. By pre-training on multi-organ ultrasound data in addition to natural images, the authors study data-efficient downstream fine-tuning on a breast-focused dataset and assess the impact of encoder-only versus encoder-decoder pre-training. Across extensive experiments, SSL pre-training generally outperforms fully supervised baselines, with multi-organ pre-training offering the largest gains when more labels are available; benefits persist though can diminish at very low label fractions or with higher input resolutions. The findings suggest that leveraging diverse organ data during pre-training yields robust, generalizable representations that improve lesion segmentation in data-scarce medical settings, with practical implications for deploying SSL-based pipelines in clinical imaging tasks.

Abstract

Self-supervised learning has proven to be an effective way to learn representations in domains where annotated labels are scarce, such as medical imaging. A widely adopted framework for this purpose is contrastive learning and it has been applied to different scenarios. This paper seeks to advance our understanding of the contrastive learning framework by exploring a novel perspective: employing multi-organ datasets for pre-training models tailored to specific organ-related target tasks. More specifically, our target task is breast tumour segmentation in ultrasound images. The pre-training datasets include ultrasound images from other organs, such as the lungs and heart, and large datasets of natural images. Our results show that conventional contrastive learning pre-training improves performance compared to supervised baseline approaches. Furthermore, our pre-trained models achieve comparable performance when fine-tuned with only half of the available labelled data. Our findings also show the advantages of pre-training on diverse organ data for improving performance in the downstream task.
Paper Structure (21 sections, 7 equations, 6 figures, 8 tables)

This paper contains 21 sections, 7 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Overview of the implemented method. The procedure starts with pre-training a model using a self-supervised learning method (SimCLR, MoCo or SimSiam) on an unlabeled dataset. The pre-trained weights are then used as the models' initialization when applied to a labelled dataset for the downstream task after the pre-training phase. In the breast pipeline, the breast ultrasound dataset is solely used for pre-training and in the multi-organ pipeline, datasets from different organs are complementary to the breast dataset.
  • Figure 2: Visual representation of different dataset partitions. The BUS partitions are used in the fine-tuning of the models.
  • Figure B.3: Generated masks of the pre-trained ResNet-50 backbones, pre-trained and fine-tuned using $32\times32$ images. The first column shows the masks of benign tumours, and the second column shows the masks of malignant tumours. (a) Ground truth; (b) SimSiam – CIFAR-10; (c) SimSiam - BUS ($\bigcirc$); (d) SimSiam - Multi-organ ($\bigtriangleup$).
  • Figure B.4: Generated masks of the pre-trained ResNet-50 backbones, pre-trained and fine-tuned using $64\times64$ images. The first column shows the masks of benign tumours, and the second column shows the masks of malignant tumours. (a) Ground truth; (b) Supervised baseline; (c) SimSiam – BUS ($\bigcirc$) + mini-ImageNet; (d) SimSiam – Multi-organ ($\bigtriangleup$).
  • Figure B.5: Generated masks of the pre-trained U-Nets, pre-trained and fine-tuned using $32\times32$ images. The first column shows the masks of benign tumours, and the second column shows the masks of malignant tumours. (a) Ground truth; (b) Supervised baseline; (c) MoCo – BUS ($\bigcirc$) + CIFAR-10; (d) MoCo – Multi-organ ($\bigtriangleup$).
  • ...and 1 more figures