Table of Contents
Fetching ...

UltraSam: A Foundation Model for Ultrasound using Large Open-Access Segmentation Datasets

Adrien Meyer, Aditya Murali, Farahdiba Zarin, Didier Mutter, Nicolas Padoy

TL;DR

Ultrasound imaging faces limited annotated data and cross-domain generalization. The authors assemble US-43d, the largest public ultrasound segmentation dataset to date, and fully finetune SAM on it to create UltraSam, a versatile foundation model that supports prompt-based segmentation and a new prompted-classification capability. UltraSam outperforms existing medical SAMs and US foundation models on downstream tasks across multiple public datasets, and an UltraSam-initialized Vision Transformer shows superior performance in segmentation and classification relative to baselines. The introduction of prompted classification provides a novel, user-guided approach to structure labeling, enhancing medical image analysis workflows. The work is complemented by open-source releases of code and pretrained models to enable community-driven expansion of datasets and models for ultrasound analysis.

Abstract

Purpose: Automated ultrasound image analysis is challenging due to anatomical complexity and limited annotated data. To tackle this, we take a data-centric approach, assembling the largest public ultrasound segmentation dataset and training a versatile visual foundation model tailored for ultrasound. Methods: We compile US-43d, a large-scale collection of 43 open-access ultrasound datasets with over 280,000 images and segmentation masks for more than 50 anatomical structures. We then introduce UltraSam, an adaptation of the Segment Anything Model (SAM) that is trained on US-43d and supports both point- and box-prompts. Finally, we introduce a new use case for SAM-style models by using UltraSam as a model initialization that can be fine-tuned for various downstream analysis tasks, demonstrating UltraSam's foundational capabilities. Results: UltraSam achieves vastly improved performance over existing SAM-style models for prompt-based segmentation on three diverse public datasets. Moreover, an UltraSam-initialized Vision Transformer surpasses ImageNet-, SAM-, and MedSAM-initialized models in various downstream segmentation and classification tasks, highlighting UltraSam's effectiveness as a foundation model. Conclusion: We compile US-43d, a large-scale unified ultrasound dataset, and introduce UltraSam, a powerful multi-purpose SAM-style model for ultrasound images. We release our code and pretrained models at https://github.com/CAMMA-public/UltraSam and invite the community to further this effort by contributing high-quality datasets.

UltraSam: A Foundation Model for Ultrasound using Large Open-Access Segmentation Datasets

TL;DR

Ultrasound imaging faces limited annotated data and cross-domain generalization. The authors assemble US-43d, the largest public ultrasound segmentation dataset to date, and fully finetune SAM on it to create UltraSam, a versatile foundation model that supports prompt-based segmentation and a new prompted-classification capability. UltraSam outperforms existing medical SAMs and US foundation models on downstream tasks across multiple public datasets, and an UltraSam-initialized Vision Transformer shows superior performance in segmentation and classification relative to baselines. The introduction of prompted classification provides a novel, user-guided approach to structure labeling, enhancing medical image analysis workflows. The work is complemented by open-source releases of code and pretrained models to enable community-driven expansion of datasets and models for ultrasound analysis.

Abstract

Purpose: Automated ultrasound image analysis is challenging due to anatomical complexity and limited annotated data. To tackle this, we take a data-centric approach, assembling the largest public ultrasound segmentation dataset and training a versatile visual foundation model tailored for ultrasound. Methods: We compile US-43d, a large-scale collection of 43 open-access ultrasound datasets with over 280,000 images and segmentation masks for more than 50 anatomical structures. We then introduce UltraSam, an adaptation of the Segment Anything Model (SAM) that is trained on US-43d and supports both point- and box-prompts. Finally, we introduce a new use case for SAM-style models by using UltraSam as a model initialization that can be fine-tuned for various downstream analysis tasks, demonstrating UltraSam's foundational capabilities. Results: UltraSam achieves vastly improved performance over existing SAM-style models for prompt-based segmentation on three diverse public datasets. Moreover, an UltraSam-initialized Vision Transformer surpasses ImageNet-, SAM-, and MedSAM-initialized models in various downstream segmentation and classification tasks, highlighting UltraSam's effectiveness as a foundation model. Conclusion: We compile US-43d, a large-scale unified ultrasound dataset, and introduce UltraSam, a powerful multi-purpose SAM-style model for ultrasound images. We release our code and pretrained models at https://github.com/CAMMA-public/UltraSam and invite the community to further this effort by contributing high-quality datasets.

Paper Structure

This paper contains 3 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: UltraSam overview. a) US-43d: a large-scale open US segmentation dataset. b) Fine-tuning SAM on US-43d enables strong zero-shot, prompt-based segmentation. c) UltraSam's pretrained feature extractor provides a robust foundation for downstream tasks. d) We propose prompted classification to enhance structure classification using a user-specified prompt.
  • Figure 2: Overview of US-43d, grouped by clinical applications. PTO refers to patent foramen ovale, and GIST refers to gastrointestinal stromal tumor.