Table of Contents
Fetching ...

S4M: 4-points to Segment Anything

Adrien Meyer, Lorenzo Arboit, Giuseppe Massimiani, Shih-Min Yin, Didier Mutter, Nicolas Padoy

TL;DR

S4M addresses the annotation bottleneck in medical segmentation by replacing iterative corrections with geometry-aware prompting. It introduces 4-point prompts with role-specific embeddings (extreme or major/minor) and a Canvas auxiliary task that trains the model to reason about shape from prompts alone. Across eight ultrasound and endoscopy datasets, S4M achieves a consistent +3.42 $mIoU$ gain over a strong SAM baseline and enables faster annotation via major/minor prompts. The approach preserves compatibility with existing prompting modes, enhances robustness to complex shape boundaries, and better aligns with clinical measurement practices, paving the way for scalable medical imaging datasets.

Abstract

Purpose: The Segment Anything Model (SAM) promises to ease the annotation bottleneck in medical segmentation, but overlapping anatomy and blurred boundaries make its point prompts ambiguous, leading to cycles of manual refinement to achieve precise masks. Better prompting strategies are needed. Methods: We propose a structured prompting strategy using 4 points as a compact instance-level shape description. We study two 4-point variants: extreme points and the proposed major/minor axis endpoints, inspired by ultrasound measurement practice. SAM cannot fully exploit such structured prompts because it treats all points identically and lacks geometry-aware reasoning. To address this, we introduce S4M (4-points to Segment Anything), which augments SAM to interpret 4 points as relational cues rather than isolated clicks. S4M expands the prompt space with role-specific embeddings and adds an auxiliary "Canvas" pretext task that sketches coarse masks directly from prompts, fostering geometry-aware reasoning. Results: Across eight datasets in ultrasound and surgical endoscopy, S4M improves segmentation by +3.42 mIoU over a strong SAM baseline at equal prompt budget. An annotation study with three clinicians further shows that major/minor prompts enable faster annotation. Conclusion: S4M increases performance, reduces annotation effort, and aligns prompting with clinical practice, enabling more scalable dataset development in medical imaging.

S4M: 4-points to Segment Anything

TL;DR

S4M addresses the annotation bottleneck in medical segmentation by replacing iterative corrections with geometry-aware prompting. It introduces 4-point prompts with role-specific embeddings (extreme or major/minor) and a Canvas auxiliary task that trains the model to reason about shape from prompts alone. Across eight ultrasound and endoscopy datasets, S4M achieves a consistent +3.42 gain over a strong SAM baseline and enables faster annotation via major/minor prompts. The approach preserves compatibility with existing prompting modes, enhances robustness to complex shape boundaries, and better aligns with clinical measurement practices, paving the way for scalable medical imaging datasets.

Abstract

Purpose: The Segment Anything Model (SAM) promises to ease the annotation bottleneck in medical segmentation, but overlapping anatomy and blurred boundaries make its point prompts ambiguous, leading to cycles of manual refinement to achieve precise masks. Better prompting strategies are needed. Methods: We propose a structured prompting strategy using 4 points as a compact instance-level shape description. We study two 4-point variants: extreme points and the proposed major/minor axis endpoints, inspired by ultrasound measurement practice. SAM cannot fully exploit such structured prompts because it treats all points identically and lacks geometry-aware reasoning. To address this, we introduce S4M (4-points to Segment Anything), which augments SAM to interpret 4 points as relational cues rather than isolated clicks. S4M expands the prompt space with role-specific embeddings and adds an auxiliary "Canvas" pretext task that sketches coarse masks directly from prompts, fostering geometry-aware reasoning. Results: Across eight datasets in ultrasound and surgical endoscopy, S4M improves segmentation by +3.42 mIoU over a strong SAM baseline at equal prompt budget. An annotation study with three clinicians further shows that major/minor prompts enable faster annotation. Conclusion: S4M increases performance, reduces annotation effort, and aligns prompting with clinical practice, enabling more scalable dataset development in medical imaging.

Paper Structure

This paper contains 6 sections, 3 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Prompt placement is key for efficient interactive segmentation. In standard prompting (a), the initial click (blue star) is ambiguous, leading to iterative positive (green) and negative (red) refinement points. Our strategies instead directly place prompts at Extreme-points (b.ii) or the proposed Major/Minor-points (b.iii, instance-oriented) to capture object shape, achieving higher mIoU with fewer interactions (b.iv), averaged over 8 datasets (6 supervised, 2 zero-shot).
  • Figure 2: Overview of S4M. Built upon the standard SAM encoder/decoder (a), S4M introduces (b) role-aware 4-point prompts (extreme or major/minor) encoded with semantic and positional embeddings, and (c) an auxiliary Canvas task, a training-only decoder with its own Canvas token, supervised by the convex hull of the mask to strengthen 4-point representation learning. This example uses extreme points.
  • Figure 3: 4-points generation. Our method emulates the measurements clinicians use in ultrasound, producing shape-aware 4-point prompts that correspond with the embedded measurement visible on the image (Zoom in for best view); for extreme points, the same procedure applies except that the image axis replaces the PCA axis.
  • Figure 4: Evolution of performance with different prompt-budget. Per-dataset mean IoU comparison for surgical endoscopy (top row) and ultrasound (bottom row) under supervised (first three columns) and zero-shot (ZS, last column) settings. Across all datasets, both extreme (orange dotted) and major/minor points (pink dash-dotted) using S4M outperform both region-based points (blue solid) and bounding box (dashed gray line) SAM$^{+}$ baselines.
  • Figure 5: S4M results on Endoscapes (left, extreme) and OTU (right, major/minor).
  • ...and 3 more figures