Table of Contents
Fetching ...

ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image

Hallee E. Wong, Marianne Rakic, John Guttag, Adrian V. Dalca

TL;DR

ScribblePrompt addresses the challenge of segmenting unseen biomedical images by enabling fast, interactive segmentation that supports scribbles, clicks, and bounding boxes. It combines a problem formulation that learns from iterative user interactions, a robust prompt-simulation and data-augmentation strategy, and two efficient network variants (UNet-based and SAM-adapter) to generalize to new tasks without retraining. The approach achieves superior Dice scores and lower HD95 compared with baselines on 12 unseen evaluation sets, and demonstrates practical speed, including CPU-only inference, with strong user study preferences. The work contributes a scalable training paradigm with synthetic labels, versatile interaction prompts, and open-source code, demonstrating substantial potential to reduce manual annotation workload in biomedical imaging.

Abstract

Biomedical image segmentation is a crucial part of both scientific research and clinical care. With enough labelled data, deep learning models can be trained to accurately automate specific biomedical image segmentation tasks. However, manually segmenting images to create training data is highly labor intensive and requires domain expertise. We present \emph{ScribblePrompt}, a flexible neural network based interactive segmentation tool for biomedical imaging that enables human annotators to segment previously unseen structures using scribbles, clicks, and bounding boxes. Through rigorous quantitative experiments, we demonstrate that given comparable amounts of interaction, ScribblePrompt produces more accurate segmentations than previous methods on datasets unseen during training. In a user study with domain experts, ScribblePrompt reduced annotation time by 28% while improving Dice by 15% compared to the next best method. ScribblePrompt's success rests on a set of careful design decisions. These include a training strategy that incorporates both a highly diverse set of images and tasks, novel algorithms for simulated user interactions and labels, and a network that enables fast inference. We showcase ScribblePrompt in an interactive demo, provide code, and release a dataset of scribble annotations at https://scribbleprompt.csail.mit.edu

ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image

TL;DR

ScribblePrompt addresses the challenge of segmenting unseen biomedical images by enabling fast, interactive segmentation that supports scribbles, clicks, and bounding boxes. It combines a problem formulation that learns from iterative user interactions, a robust prompt-simulation and data-augmentation strategy, and two efficient network variants (UNet-based and SAM-adapter) to generalize to new tasks without retraining. The approach achieves superior Dice scores and lower HD95 compared with baselines on 12 unseen evaluation sets, and demonstrates practical speed, including CPU-only inference, with strong user study preferences. The work contributes a scalable training paradigm with synthetic labels, versatile interaction prompts, and open-source code, demonstrating substantial potential to reduce manual annotation workload in biomedical imaging.

Abstract

Biomedical image segmentation is a crucial part of both scientific research and clinical care. With enough labelled data, deep learning models can be trained to accurately automate specific biomedical image segmentation tasks. However, manually segmenting images to create training data is highly labor intensive and requires domain expertise. We present \emph{ScribblePrompt}, a flexible neural network based interactive segmentation tool for biomedical imaging that enables human annotators to segment previously unseen structures using scribbles, clicks, and bounding boxes. Through rigorous quantitative experiments, we demonstrate that given comparable amounts of interaction, ScribblePrompt produces more accurate segmentations than previous methods on datasets unseen during training. In a user study with domain experts, ScribblePrompt reduced annotation time by 28% while improving Dice by 15% compared to the next best method. ScribblePrompt's success rests on a set of careful design decisions. These include a training strategy that incorporates both a highly diverse set of images and tasks, novel algorithms for simulated user interactions and labels, and a network that enables fast inference. We showcase ScribblePrompt in an interactive demo, provide code, and release a dataset of scribble annotations at https://scribbleprompt.csail.mit.edu
Paper Structure (39 sections, 1 equation, 38 figures, 8 tables)

This paper contains 39 sections, 1 equation, 38 figures, 8 tables.

Figures (38)

  • Figure 1: ScribblePrompt enables rapid iterative interactive segmentation of unseen tasks using bounding boxes, clicks, and scribbles. We show predictions from ScribblePrompt with iterative interaction steps on examples from datasets unseen during training. At each step, we visualize positive scribble and click inputs in green, negative scribble and click inputs in red, bounding box inputs in yellow, and the predicted segmentation in blue. Scribble thickness is enlarged for visual clarity. See Supplementary Material for more examples.
  • Figure 1: Line scribbles. Given an input mask $z$, we draw random lines by sampling two end points from $\{(u,v)| z_{uv}=1\}$. We use a random deformation field to warp the line scribbles and then multiply by the binary input mask $z$ to correct parts of the scribble that were warped outside the mask. We can simulate positive scribbles by applying the algorithm to the ground truth label $y$ (top) and negative scribbles by applying the algorithm to the background $1-y$ (bottom).
  • Figure 2: Training. We simulate $k$ consecutive steps of interactive segmentation. Given an image segmentation pair $(x^t,y^t)$, we first simulate a set of initial interactions $u_1$, which may contain bounding boxes, clicks, and/or scribbles. We predict segmentation $\hat{y}_1^t := f_\theta(x^t, u_1, \hat{y}_0^t)$ where the initial prediction $\hat{y}^t_0$ is set to zeros. In the second step, we simulate corrections using the error region $\varepsilon_1^t$ between the previous prediction $\hat{y}_1^t$ and ground truth $y^t$, and add them to the set of initial interactions $u_1$ to get $u_2$. We predict segmentation $\hat{y}_2^t := f_\theta(x^t, u_2, \hat{y}_1^t)$ and repeat to produce a series of predictions, $\hat{y}_1^t, \dots, \hat{y}_k^t$. We learn $\theta$ to minimize $\sum_{i=1}^k \mathcal{L}_{seg}(y^t, \hat{y}_i^t)$, the sum of losses between the target segmentation $y^t$ and iterative predictions $\hat{y}_1^t, \dots, \hat{y}_k^t$.
  • Figure 2: Centerline scribbles. Given an input mask, we apply a thinning algorithm zhangsuen_thinning_1984 to get a 1-pixel wide skeleton. We break up the skeleton using a random mask and use a random deformation field to warp the broken skeleton. Lastly, we multiply the scribble mask by the input binary mask to remove parts of the scribble that were warped outside the input mask. We can simulate positive scribbles by applying the algorithm to the label $y$ (top) and negative scribbles by applying the algorithm to the background $1-y$ (bottom).
  • Figure 3: Simulated scribbles and clicks. Positive interactions (green) are simulated on the segmentation label $y^t$ (blue), while negative interactions (red) are simulated on the background $1 - y^t$. Scribble thickness is enlarged for visual clarity.
  • ...and 33 more figures