Table of Contents
Fetching ...

Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches

Andrea Moglia, Matteo Leccardi, Matteo Cavicchioli, Alice Maccarini, Marco Marcon, Luca Mainardi, Pietro Cerveri

TL;DR

This survey analyzes the rise of generalist segmentation models in medical imaging, tracing the Segment Anything (SAM) lineage and its medical adaptations, notably SAM 2, while benchmarking against task-specific 3D segmentation methods. It proposes an extensible taxonomy that unifies architecture, fusion strategy, prompts, and adaptation methods, and it evaluates performance across datasets and organ targets to map transferability and gaps. The authors emphasize regulatory, privacy, budget, and trustworthy-AI considerations, and offer future directions including synthetic data, more affordable architectures, and lessons from large language models to guide clinical translation. Overall, the work clarifies where generalist approaches currently excel, where task-specific methods still dominate, and how to navigate the path to clinically deployable, scalable segmentation solutions.

Abstract

Following the successful paradigm shift of large language models, leveraging pre-training on a massive corpus of data and fine-tuning on different downstream tasks, generalist models have made their foray into computer vision. The introduction of Segment Anything Model (SAM) set a milestone on segmentation of natural images, inspiring the design of a multitude of architectures for medical image segmentation. In this survey we offer a comprehensive and in-depth investigation on generalist models for medical image segmentation. We start with an introduction on the fundamentals concepts underpinning their development. Then, we provide a taxonomy on the different declinations of SAM in terms of zero-shot, few-shot, fine-tuning, adapters, on the recent SAM 2, on other innovative models trained on images alone, and others trained on both text and images. We thoroughly analyze their performances at the level of both primary research and best-in-literature, followed by a rigorous comparison with the state-of-the-art task-specific models. We emphasize the need to address challenges in terms of compliance with regulatory frameworks, privacy and security laws, budget, and trustworthy artificial intelligence (AI). Finally, we share our perspective on future directions concerning synthetic data, early fusion, lessons learnt from generalist models in natural language processing, agentic AI and physical AI, and clinical translation.

Generalist Models in Medical Image Segmentation: A Survey and Performance Comparison with Task-Specific Approaches

TL;DR

This survey analyzes the rise of generalist segmentation models in medical imaging, tracing the Segment Anything (SAM) lineage and its medical adaptations, notably SAM 2, while benchmarking against task-specific 3D segmentation methods. It proposes an extensible taxonomy that unifies architecture, fusion strategy, prompts, and adaptation methods, and it evaluates performance across datasets and organ targets to map transferability and gaps. The authors emphasize regulatory, privacy, budget, and trustworthy-AI considerations, and offer future directions including synthetic data, more affordable architectures, and lessons from large language models to guide clinical translation. Overall, the work clarifies where generalist approaches currently excel, where task-specific methods still dominate, and how to navigate the path to clinically deployable, scalable segmentation solutions.

Abstract

Following the successful paradigm shift of large language models, leveraging pre-training on a massive corpus of data and fine-tuning on different downstream tasks, generalist models have made their foray into computer vision. The introduction of Segment Anything Model (SAM) set a milestone on segmentation of natural images, inspiring the design of a multitude of architectures for medical image segmentation. In this survey we offer a comprehensive and in-depth investigation on generalist models for medical image segmentation. We start with an introduction on the fundamentals concepts underpinning their development. Then, we provide a taxonomy on the different declinations of SAM in terms of zero-shot, few-shot, fine-tuning, adapters, on the recent SAM 2, on other innovative models trained on images alone, and others trained on both text and images. We thoroughly analyze their performances at the level of both primary research and best-in-literature, followed by a rigorous comparison with the state-of-the-art task-specific models. We emphasize the need to address challenges in terms of compliance with regulatory frameworks, privacy and security laws, budget, and trustworthy artificial intelligence (AI). Finally, we share our perspective on future directions concerning synthetic data, early fusion, lessons learnt from generalist models in natural language processing, agentic AI and physical AI, and clinical translation.

Paper Structure

This paper contains 61 sections, 26 figures.

Figures (26)

  • Figure 1: Timeline of key developments of generalist and task-specific models for medical image segmentation. The time is referred to the publication date of the primary work, e.g., in arXiv.
  • Figure 2: Outline of the survey.
  • Figure 3: Architecture of LoRA. Image adapted from hu2022lora.
  • Figure 4: Proposed taxonomy for the generalist models for medical image segmentation.
  • Figure 5: Architecture of SAM. Image adapted from SAM4MIS.
  • ...and 21 more figures