Table of Contents
Fetching ...

Semantic Segmentation Refiner for Ultrasound Applications with Zero-Shot Foundation Models

Hedda Cohen Indelman, Elay Dahan, Angeles M. Perez-Agosto, Carmit Shiran, Doron Shaked, Nati Daniel

TL;DR

The paper tackles semantic segmentation in ultrasound under data scarcity and domain gaps by introducing a two-stage refinement that preserves a zero-shot paradigm. A coarse segmentor trained on a small subset provides a mask from which positive points are selected via k-medoids and negative points are optimized for background context, enabling a zero-shot foundation model (SonoSAM) to generate refined pathology masks without fine-tuning. Across a musculoskeletal ultrasound dataset focusing on tendon pathology, the method consistently improves Dice similarity over a baseline, with larger gains in smaller data regimes, and ablation studies highlight the importance of SonoSAM and negative prompt refinement. The approach reduces the need for large labeled ultrasound datasets and is applicable to other ultrasound tasks, though it incurs higher latency due to its two-stage nature.

Abstract

Despite the remarkable success of deep learning in medical imaging analysis, medical image segmentation remains challenging due to the scarcity of high-quality labeled images for supervision. Further, the significant domain gap between natural and medical images in general and ultrasound images in particular hinders fine-tuning models trained on natural images to the task at hand. In this work, we address the performance degradation of segmentation models in low-data regimes and propose a prompt-less segmentation method harnessing the ability of segmentation foundation models to segment abstract shapes. We do that via our novel prompt point generation algorithm which uses coarse semantic segmentation masks as input and a zero-shot prompt-able foundation model as an optimization target. We demonstrate our method on a segmentation findings task (pathologic anomalies) in ultrasound images. Our method's advantages are brought to light in varying degrees of low-data regime experiments on a small-scale musculoskeletal ultrasound images dataset, yielding a larger performance gain as the training set size decreases.

Semantic Segmentation Refiner for Ultrasound Applications with Zero-Shot Foundation Models

TL;DR

The paper tackles semantic segmentation in ultrasound under data scarcity and domain gaps by introducing a two-stage refinement that preserves a zero-shot paradigm. A coarse segmentor trained on a small subset provides a mask from which positive points are selected via k-medoids and negative points are optimized for background context, enabling a zero-shot foundation model (SonoSAM) to generate refined pathology masks without fine-tuning. Across a musculoskeletal ultrasound dataset focusing on tendon pathology, the method consistently improves Dice similarity over a baseline, with larger gains in smaller data regimes, and ablation studies highlight the importance of SonoSAM and negative prompt refinement. The approach reduces the need for large labeled ultrasound datasets and is applicable to other ultrasound tasks, though it incurs higher latency due to its two-stage nature.

Abstract

Despite the remarkable success of deep learning in medical imaging analysis, medical image segmentation remains challenging due to the scarcity of high-quality labeled images for supervision. Further, the significant domain gap between natural and medical images in general and ultrasound images in particular hinders fine-tuning models trained on natural images to the task at hand. In this work, we address the performance degradation of segmentation models in low-data regimes and propose a prompt-less segmentation method harnessing the ability of segmentation foundation models to segment abstract shapes. We do that via our novel prompt point generation algorithm which uses coarse semantic segmentation masks as input and a zero-shot prompt-able foundation model as an optimization target. We demonstrate our method on a segmentation findings task (pathologic anomalies) in ultrasound images. Our method's advantages are brought to light in varying degrees of low-data regime experiments on a small-scale musculoskeletal ultrasound images dataset, yielding a larger performance gain as the training set size decreases.
Paper Structure (20 sections, 2 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 20 sections, 2 equations, 9 figures, 3 tables, 1 algorithm.

Figures (9)

  • Figure 1: A high-level illustration of our semantic segmentation refinement method with zero-shot foundation models. A pre-trained segmentation model predicts a semantic segmentation for each class of an input image. In this example, classes comprise anatomies and pathologies in an ultrasound image, and the coarse segmentor output depicts the predicted semantic segmentation of a pathology. A prompt selection model selects positive and negative points. Consequently, a zero-shot semantic segmentation mask of the pathology is predicted by a foundation segmentation model, prompted by the selected points for the input image. Positive prompt points are depicted in red, and negative prompt points are depicted in blue. The pathology semantic segmentation prediction is highlighted in red. For illustration purposes, the muscle is highlighted in purple, the tendon in yellow, and the bone in green. The freeze symbol indicates preventing gradients from being propagated to the model weights.
  • Figure 2: An illustration of our positive (foreground) points selection module, depicted in red. A threshold is applied to the coarse segmentation prediction. A $k$- medoids clustering algorithm is applied to select $k$ positive pathology points.
  • Figure 3: An illustration of our negative (background) points selection module. In addition to the positive selected points (Sec. \ref{['positive_selection']}), negative points are selected randomly from the modified ground-truth tendon mask. The points are flipped to initialize the settings of the complementary tendon segmentation problem. Our points optimization model optimizes prompt points selection w.r.t. the complementary tendon zero-shot segmentation problem (Sec. \ref{['negative_selection']}). Finally, prompt points are again flipped to account for positive and negative prompt points towards the pathology segmentation.
  • Figure 4: 100% of train set.
  • Figure 5: 35% of train set.
  • ...and 4 more figures