Table of Contents
Fetching ...

Zero-shot capability of SAM-family models for bone segmentation in CT scans

Caroline Magg, Hoel Kervadec, Clara I. Sánchez

TL;DR

This work uses non-iterative, ``optimal'' prompting strategies composed of bounding box, points and combinations to test the zero-shot capability of SAM-family models for bone CT segmentation on three different skeletal regions.

Abstract

The Segment Anything Model (SAM) and similar models build a family of promptable foundation models (FMs) for image and video segmentation. The object of interest is identified using prompts, such as bounding boxes or points. With these FMs becoming part of medical image segmentation, extensive evaluation studies are required to assess their strengths and weaknesses in clinical setting. Since the performance is highly dependent on the chosen prompting strategy, it is important to investigate different prompting techniques to define optimal guidelines that ensure effective use in medical image segmentation. Currently, no dedicated evaluation studies exist specifically for bone segmentation in CT scans, leaving a gap in understanding the performance for this task. Thus, we use non-iterative, ``optimal'' prompting strategies composed of bounding box, points and combinations to test the zero-shot capability of SAM-family models for bone CT segmentation on three different skeletal regions. Our results show that the best settings depend on the model type and size, dataset characteristics and objective to optimize. Overall, SAM and SAM2 prompted with a bounding box in combination with the center point for all the components of an object yield the best results across all tested settings. As the results depend on multiple factors, we provide a guideline for informed decision-making in 2D prompting with non-interactive, ''optimal'' prompts.

Zero-shot capability of SAM-family models for bone segmentation in CT scans

TL;DR

This work uses non-iterative, ``optimal'' prompting strategies composed of bounding box, points and combinations to test the zero-shot capability of SAM-family models for bone CT segmentation on three different skeletal regions.

Abstract

The Segment Anything Model (SAM) and similar models build a family of promptable foundation models (FMs) for image and video segmentation. The object of interest is identified using prompts, such as bounding boxes or points. With these FMs becoming part of medical image segmentation, extensive evaluation studies are required to assess their strengths and weaknesses in clinical setting. Since the performance is highly dependent on the chosen prompting strategy, it is important to investigate different prompting techniques to define optimal guidelines that ensure effective use in medical image segmentation. Currently, no dedicated evaluation studies exist specifically for bone segmentation in CT scans, leaving a gap in understanding the performance for this task. Thus, we use non-iterative, ``optimal'' prompting strategies composed of bounding box, points and combinations to test the zero-shot capability of SAM-family models for bone CT segmentation on three different skeletal regions. Our results show that the best settings depend on the model type and size, dataset characteristics and objective to optimize. Overall, SAM and SAM2 prompted with a bounding box in combination with the center point for all the components of an object yield the best results across all tested settings. As the results depend on multiple factors, we provide a guideline for informed decision-making in 2D prompting with non-interactive, ''optimal'' prompts.

Paper Structure

This paper contains 44 sections, 23 figures, 3 tables.

Figures (23)

  • Figure 1: Timeline overview of relevant publications: models used for this study (highlighted with a bounding box) and related evaluation studies as mentioned in Table \ref{['tab:related_work']}. The first timestamp of pre-prints (not accepted peer-reviewed publications) are used for the visualization.
  • Figure 2: Comparison of the Sam and Sam2 architectures.
  • Figure 2: Average prediction time per slice (sec.): The table on the left sorts the inference time averaged over all prompting strategies in ascending order. The line plot on the right shows the time per slice (sec.) for the different prompting strategies for each model.
  • Figure 3: Dataset Overview: 3 private datasets containing 80 CT scans from three skeletal regions, i.e., shoulder (D1), wrist (D2) and knee (D3) are used. The knee dataset comes with two different sets for the tibia segmentation, i.e., cortical (D3a) and full (D3b) tibia segmentation.
  • Figure 4: Prompt primitives: (a) bounding box, (b) (EDT) center, (c) centroid, (d) positive random points inside the object, (e) negative random points outside the object. The largest component's prompt is blue (i.e., one component (1C)), while the others are white, resulting in the setting with up to 5 components (5C), when all prompts are used.
  • ...and 18 more figures