Table of Contents
Fetching ...

A Semi-Supervised Framework for Breast Ultrasound Segmentation with Training-Free Pseudo-Label Generation and Label Refinement

Ruili Li, Jiayi Ding, Ruiyu Li, Yilun Jin, Shiwen Ge, Yuwen Zeng, Xiaoyong Zhang, Eichi Takaya, Jan Vrba, Noriyasu Homma

TL;DR

This work proposes a semi-supervised framework with training-free pseudo-label generation and label refinement, enabling scalable semi-supervised medical image segmentation under limited annotations and introduces uncertainty entropy weighted fusion and adaptive uncertainty-guided reverse contrastive learning to improve boundary discrimination.

Abstract

Semi-supervised learning (SSL) has emerged as a promising paradigm for breast ultrasound (BUS) image segmentation, but it often suffers from unstable pseudo labels under extremely limited annotations, leading to inaccurate supervision and degraded performance. Recent vision-language models (VLMs) provide a new opportunity for pseudo-label generation, yet their effectiveness on BUS images remains limited because domain-specific prompts are difficult to transfer. To address this issue, we propose a semi-supervised framework with training-free pseudo-label generation and label refinement. By leveraging simple appearance-based descriptions (e.g., dark oval), our method enables cross-domain structural transfer between natural and medical images, allowing VLMs to generate structurally consistent pseudo labels. These pseudo labels are used to warm up a static teacher that captures global structural priors of breast lesions. Combined with an exponential moving average teacher, we further introduce uncertainty entropy weighted fusion and adaptive uncertainty-guided reverse contrastive learning to improve boundary discrimination. Experiments on four BUS datasets demonstrate that our method achieves performance comparable to fully supervised models even with only 2.5% labeled data, significantly outperforming existing SSL approaches. Moreover, the proposed paradigm is readily extensible: for other imaging modalities or diseases, only a global appearance description is required to obtain reliable pseudo supervision, enabling scalable semi-supervised medical image segmentation under limited annotations.

A Semi-Supervised Framework for Breast Ultrasound Segmentation with Training-Free Pseudo-Label Generation and Label Refinement

TL;DR

This work proposes a semi-supervised framework with training-free pseudo-label generation and label refinement, enabling scalable semi-supervised medical image segmentation under limited annotations and introduces uncertainty entropy weighted fusion and adaptive uncertainty-guided reverse contrastive learning to improve boundary discrimination.

Abstract

Semi-supervised learning (SSL) has emerged as a promising paradigm for breast ultrasound (BUS) image segmentation, but it often suffers from unstable pseudo labels under extremely limited annotations, leading to inaccurate supervision and degraded performance. Recent vision-language models (VLMs) provide a new opportunity for pseudo-label generation, yet their effectiveness on BUS images remains limited because domain-specific prompts are difficult to transfer. To address this issue, we propose a semi-supervised framework with training-free pseudo-label generation and label refinement. By leveraging simple appearance-based descriptions (e.g., dark oval), our method enables cross-domain structural transfer between natural and medical images, allowing VLMs to generate structurally consistent pseudo labels. These pseudo labels are used to warm up a static teacher that captures global structural priors of breast lesions. Combined with an exponential moving average teacher, we further introduce uncertainty entropy weighted fusion and adaptive uncertainty-guided reverse contrastive learning to improve boundary discrimination. Experiments on four BUS datasets demonstrate that our method achieves performance comparable to fully supervised models even with only 2.5% labeled data, significantly outperforming existing SSL approaches. Moreover, the proposed paradigm is readily extensible: for other imaging modalities or diseases, only a global appearance description is required to obtain reliable pseudo supervision, enabling scalable semi-supervised medical image segmentation under limited annotations.
Paper Structure (19 sections, 15 equations, 7 figures, 5 tables)

This paper contains 19 sections, 15 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: To illustrate how different textual prompts affect zero-shot knowledge transfer, we visualize bounding box generated from (c) medical terms ("tumor"), (d) radiological attributes ("high density"), and (e) appearance-based descriptions("dark oval.dark round.dark lobulated").
  • Figure 2: Overview of the proposed semi-supervised BUS segmentation framework. The pipeline consists of two stages: (1) Appearance-Prompted Pseudo-Label Generation (APPG), where appearance-prompted vision–language models (VLMs) produce initial pseudo labels in a training-free manner; (2) Pseudo-Label Refinement, which includes two steps: static-teacher warm-up training to capture coarse structural priors of breast lesions, and uncertainty-based semi-supervised learning using a dual-teacher framework with Uncertainty–Entropy Weighted Fusion (UEWF) and Adaptive Uncertainty-Guided Reverse Contrastive Learning (AURCL).
  • Figure 3: Structure of the proposed AURCL module. Entropy maps identify top-$k$ high-uncertainty pixels, whose predictions are reversed and aggregated into patch-level features. Contrastive learning then pulls the original and reversed high-entropy features from these high-entropy patches closer to each other, while pushing them apart from stable low-entropy patches.
  • Figure 4: Visual comparison of different state-of-the-art methods on the BUSI and UBB datasets. All models are trained in 2.5% labeled data. Red, green and yellow regions represent ground truth, prediction and overlapping regions, respectively.
  • Figure 5: Visual comparison of different ablation study on the BUSI and UBB datasets. All models are trained in 2.5% labeled data. Red, green and yellow regions represent ground truth, prediction and overlapping regions, respectively. (a) image, (b) ground truth, (c) APPG (step1), (d) STWP (step 2), (e) Ours
  • ...and 2 more figures