Table of Contents
Fetching ...

Anytime, Anywhere, Anyone: Investigating the Feasibility of Segment Anything Model for Crowd-Sourcing Medical Image Annotations

Pranav Kulkarni, Adway Kanhere, Dharmam Savani, Andrew Chan, Devina Chatterjee, Paul H. Yi, Vishwa S. Parekh

TL;DR

The potential of SAM for crowd-sourcing "sparse"annotations from non-experts to generate "dense" segmentation masks for training 3D nnU-Net models, a state-of-the-art DL segmentation model is explored.

Abstract

Curating annotations for medical image segmentation is a labor-intensive and time-consuming task that requires domain expertise, resulting in "narrowly" focused deep learning (DL) models with limited translational utility. Recently, foundation models like the Segment Anything Model (SAM) have revolutionized semantic segmentation with exceptional zero-shot generalizability across various domains, including medical imaging, and hold a lot of promise for streamlining the annotation process. However, SAM has yet to be evaluated in a crowd-sourced setting to curate annotations for training 3D DL segmentation models. In this work, we explore the potential of SAM for crowd-sourcing "sparse" annotations from non-experts to generate "dense" segmentation masks for training 3D nnU-Net models, a state-of-the-art DL segmentation model. Our results indicate that while SAM-generated annotations exhibit high mean Dice scores compared to ground-truth annotations, nnU-Net models trained on SAM-generated annotations perform significantly worse than nnU-Net models trained on ground-truth annotations ($p<0.001$, all).

Anytime, Anywhere, Anyone: Investigating the Feasibility of Segment Anything Model for Crowd-Sourcing Medical Image Annotations

TL;DR

The potential of SAM for crowd-sourcing "sparse"annotations from non-experts to generate "dense" segmentation masks for training 3D nnU-Net models, a state-of-the-art DL segmentation model is explored.

Abstract

Curating annotations for medical image segmentation is a labor-intensive and time-consuming task that requires domain expertise, resulting in "narrowly" focused deep learning (DL) models with limited translational utility. Recently, foundation models like the Segment Anything Model (SAM) have revolutionized semantic segmentation with exceptional zero-shot generalizability across various domains, including medical imaging, and hold a lot of promise for streamlining the annotation process. However, SAM has yet to be evaluated in a crowd-sourced setting to curate annotations for training 3D DL segmentation models. In this work, we explore the potential of SAM for crowd-sourcing "sparse" annotations from non-experts to generate "dense" segmentation masks for training 3D nnU-Net models, a state-of-the-art DL segmentation model. Our results indicate that while SAM-generated annotations exhibit high mean Dice scores compared to ground-truth annotations, nnU-Net models trained on SAM-generated annotations perform significantly worse than nnU-Net models trained on ground-truth annotations (, all).
Paper Structure (17 sections, 5 figures, 5 tables)

This paper contains 17 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Example of SAM on (a) abdominal CT, (b) hand x-ray, and (c) knee MRI. SAM can operate in either "segment anything" mode (column 2), where SAM automatically segments all potential objects of interest in an image, or "prompting" mode, where SAM can segment an object of interest using an interactive prompt like bounding boxes (column 3) or points (column 4).
  • Figure 2: Pipeline for crowd-sourcing sparse annotations from non-expert annotators for the purpose of training 3D DL segmentation models using SAM-generated annotations. Suppose there is an unannotated medical imaging dataset. Sparse annotations for objects of interest (e.g., organs, tumors, etc.) can be crowd-sourced from non-expert annotators. Then, segmentation masks for the objects of interest can be generated using SAM. Finally, the SAM-generated annotations can be used to train a 3D DL segmentation model (e.g., U-Net).
  • Figure 3: Illustration of the OpenLabeling tool used for crowd-sourcing bounding box annotations for the BTCV training set across the five organs of interest.
  • Figure 4: An example of crowd-sourced SAM-generated annotations for a CT scan from the BTCV training set in the axial, coronal, and sagittal views. The SAM-generated annotations are filled in while the ground-truth annotations are outlined in blue.
  • Figure 5: An example of (a) GT-nnU-Net and (b) SAM-nnU-Net segmentations for a CT scan from the BTCV test set in the axial, coronal, and sagittal views. The models are trained on fully annotated n=11 volumes from the BTCV training set. The predicted segmentations are filled in while the ground-truth annotations are outlined in blue.