Table of Contents
Fetching ...

Flip Learning: Weakly Supervised Erase to Segment Nodules in Breast Ultrasound

Yuhao Huang, Ao Chang, Haoran Dou, Xing Tao, Xinrui Zhou, Yan Cao, Ruobing Huang, Alejandro F Frangi, Lingyun Bao, Xin Yang, Dong Ni

TL;DR

Flip Learning reframes breast nodule segmentation in BUS and ABUS as a weakly supervised erasing task solvable by multi-agent reinforcement learning. The environment is encoded with superpixels/supervoxels, and learning is guided by three rewards (CSR, IDR1, IDR2) alongside a progressive curriculum, enabling precise, boundary-aware segmentation from box-level annotations. Empirical results on large in-house BUS and ABUS datasets show state-of-the-art performance among weakly supervised methods and competitive results with fully supervised approaches, with strong generalization across centers and robustness to box perturbations. This work also opens avenues for generative data synthesis and zero-shot or detector-assisted segmentation in medical ultrasound, with broad implications for efficient, scalable clinical image analysis.

Abstract

Accurate segmentation of nodules in both 2D breast ultrasound (BUS) and 3D automated breast ultrasound (ABUS) is crucial for clinical diagnosis and treatment planning. Therefore, developing an automated system for nodule segmentation can enhance user independence and expedite clinical analysis. Unlike fully-supervised learning, weakly-supervised segmentation (WSS) can streamline the laborious and intricate annotation process. However, current WSS methods face challenges in achieving precise nodule segmentation, as many of them depend on inaccurate activation maps or inefficient pseudo-mask generation algorithms. In this study, we introduce a novel multi-agent reinforcement learning-based WSS framework called Flip Learning, which relies solely on 2D/3D boxes for accurate segmentation. Specifically, multiple agents are employed to erase the target from the box to facilitate classification tag flipping, with the erased region serving as the predicted segmentation mask. The key contributions of this research are as follows: (1) Adoption of a superpixel/supervoxel-based approach to encode the standardized environment, capturing boundary priors and expediting the learning process. (2) Introduction of three meticulously designed rewards, comprising a classification score reward and two intensity distribution rewards, to steer the agents' erasing process precisely, thereby avoiding both under- and over-segmentation. (3) Implementation of a progressive curriculum learning strategy to enable agents to interact with the environment in a progressively challenging manner, thereby enhancing learning efficiency. Extensively validated on the large in-house BUS and ABUS datasets, our Flip Learning method outperforms state-of-the-art WSS methods and foundation models, and achieves comparable performance as fully-supervised learning algorithms.

Flip Learning: Weakly Supervised Erase to Segment Nodules in Breast Ultrasound

TL;DR

Flip Learning reframes breast nodule segmentation in BUS and ABUS as a weakly supervised erasing task solvable by multi-agent reinforcement learning. The environment is encoded with superpixels/supervoxels, and learning is guided by three rewards (CSR, IDR1, IDR2) alongside a progressive curriculum, enabling precise, boundary-aware segmentation from box-level annotations. Empirical results on large in-house BUS and ABUS datasets show state-of-the-art performance among weakly supervised methods and competitive results with fully supervised approaches, with strong generalization across centers and robustness to box perturbations. This work also opens avenues for generative data synthesis and zero-shot or detector-assisted segmentation in medical ultrasound, with broad implications for efficient, scalable clinical image analysis.

Abstract

Accurate segmentation of nodules in both 2D breast ultrasound (BUS) and 3D automated breast ultrasound (ABUS) is crucial for clinical diagnosis and treatment planning. Therefore, developing an automated system for nodule segmentation can enhance user independence and expedite clinical analysis. Unlike fully-supervised learning, weakly-supervised segmentation (WSS) can streamline the laborious and intricate annotation process. However, current WSS methods face challenges in achieving precise nodule segmentation, as many of them depend on inaccurate activation maps or inefficient pseudo-mask generation algorithms. In this study, we introduce a novel multi-agent reinforcement learning-based WSS framework called Flip Learning, which relies solely on 2D/3D boxes for accurate segmentation. Specifically, multiple agents are employed to erase the target from the box to facilitate classification tag flipping, with the erased region serving as the predicted segmentation mask. The key contributions of this research are as follows: (1) Adoption of a superpixel/supervoxel-based approach to encode the standardized environment, capturing boundary priors and expediting the learning process. (2) Introduction of three meticulously designed rewards, comprising a classification score reward and two intensity distribution rewards, to steer the agents' erasing process precisely, thereby avoiding both under- and over-segmentation. (3) Implementation of a progressive curriculum learning strategy to enable agents to interact with the environment in a progressively challenging manner, thereby enhancing learning efficiency. Extensively validated on the large in-house BUS and ABUS datasets, our Flip Learning method outperforms state-of-the-art WSS methods and foundation models, and achieves comparable performance as fully-supervised learning algorithms.

Paper Structure

This paper contains 25 sections, 8 equations, 13 figures, 20 tables.

Figures (13)

  • Figure 1: (a)-(f): 2D BUS images with different shapes, sizes and echo characteristics. (g)-(j): Axial views of ABUS images. (k)-(n): 3D visualization of ABUS images and their masks (in the zoom-in red circles, transparent: ground truths, red: predictions). Note that nodules in 2D breast images are labeled without any information about their detailed type. However, the ABUS images are annotated with type tags (i.e., red, green, and blue boxes represent BI-RADS2-4, respectively).
  • Figure 2: Motivations of flip learning: erasing the nodule from the original image (red region) and inpaint the erased region can flip its tag.
  • Figure 3: Overview of the proposed framework. Here, we use the BUS image as an example to better illustrate the learning process. Purple and green boundaries in the environment block represent the superpixels and annotated masks, respectively. The left-upper star shows the environment encoded with the current curriculum setting. Yellow boundaries in the erasing process reveal the changes in the erase region.
  • Figure 4: Introduction to the pre-trained classifier.
  • Figure 5: Illustration of eraser source generation.
  • ...and 8 more figures