Table of Contents
Fetching ...

Beyond Pixel-Wise Supervision for Medical Image Segmentation: From Traditional Models to Foundation Models

Yuyan Shi, Jialu Ma, Jin Yang, Shasha Wang, Yichi Zhang

TL;DR

The paper surveys annotation-efficient medical image segmentation, tracing the evolution from pixel-wise supervision to weakly supervised approaches and the emergence of foundation models like SAM. It dissects five weak-label regimes—image-level, bounding boxes, scribbles, points, and partially/mixed supervision—detailing representative methods, pseudo-label strategies, and loss formulations. It then examines the shift to foundation models, detailing SAM adaptations (e.g., Med-SA, 3DSAM-adapter) that enable prompt-based segmentation in medical imaging and discussing 3D extensions and limitations. The discussion emphasizes evaluation, domain knowledge integration, and external data usage as critical challenges, providing guidance for future foundation-model-driven progress in clinical contexts.

Abstract

Medical image segmentation plays an important role in many image-guided clinical approaches. However, existing segmentation algorithms mostly rely on the availability of fully annotated images with pixel-wise annotations for training, which can be both labor-intensive and expertise-demanding, especially in the medical imaging domain where only experts can provide reliable and accurate annotations. To alleviate this challenge, there has been a growing focus on developing segmentation methods that can train deep models with weak annotations, such as image-level, bounding boxes, scribbles, and points. The emergence of vision foundation models, notably the Segment Anything Model (SAM), has introduced innovative capabilities for segmentation tasks using weak annotations for promptable segmentation enabled by large-scale pre-training. Adopting foundation models together with traditional learning methods has increasingly gained recent interest research community and shown potential for real-world applications. In this paper, we present a comprehensive survey of recent progress on annotation-efficient learning for medical image segmentation utilizing weak annotations before and in the era of foundation models. Furthermore, we analyze and discuss several challenges of existing approaches, which we believe will provide valuable guidance for shaping the trajectory of foundational models to further advance the field of medical image segmentation.

Beyond Pixel-Wise Supervision for Medical Image Segmentation: From Traditional Models to Foundation Models

TL;DR

The paper surveys annotation-efficient medical image segmentation, tracing the evolution from pixel-wise supervision to weakly supervised approaches and the emergence of foundation models like SAM. It dissects five weak-label regimes—image-level, bounding boxes, scribbles, points, and partially/mixed supervision—detailing representative methods, pseudo-label strategies, and loss formulations. It then examines the shift to foundation models, detailing SAM adaptations (e.g., Med-SA, 3DSAM-adapter) that enable prompt-based segmentation in medical imaging and discussing 3D extensions and limitations. The discussion emphasizes evaluation, domain knowledge integration, and external data usage as critical challenges, providing guidance for future foundation-model-driven progress in clinical contexts.

Abstract

Medical image segmentation plays an important role in many image-guided clinical approaches. However, existing segmentation algorithms mostly rely on the availability of fully annotated images with pixel-wise annotations for training, which can be both labor-intensive and expertise-demanding, especially in the medical imaging domain where only experts can provide reliable and accurate annotations. To alleviate this challenge, there has been a growing focus on developing segmentation methods that can train deep models with weak annotations, such as image-level, bounding boxes, scribbles, and points. The emergence of vision foundation models, notably the Segment Anything Model (SAM), has introduced innovative capabilities for segmentation tasks using weak annotations for promptable segmentation enabled by large-scale pre-training. Adopting foundation models together with traditional learning methods has increasingly gained recent interest research community and shown potential for real-world applications. In this paper, we present a comprehensive survey of recent progress on annotation-efficient learning for medical image segmentation utilizing weak annotations before and in the era of foundation models. Furthermore, we analyze and discuss several challenges of existing approaches, which we believe will provide valuable guidance for shaping the trajectory of foundational models to further advance the field of medical image segmentation.
Paper Structure (11 sections, 4 figures, 5 tables)

This paper contains 11 sections, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Example of different types of weak annotation compared with pixel-wise fully annotation for medical image segmentation tasks.
  • Figure 2: The overall workflow of weakly supervised medical image segmentation with image-level annotations.
  • Figure 3: The overall workflow of weakly supervised medical image segmentation with sparse annotations like point, box bounding, and scribble.
  • Figure 4: The overall workflow of weakly supervised medical image segmentation with partially-supervised datasets.