Table of Contents
Fetching ...

BoxCell: Leveraging SAM for Cell Segmentation with Box Supervision

Aayush Kumar Tyagi, Vaibhav Mishra, Prathosh A. P., Mausam

TL;DR

This work proposes BoxCell, a cell segmentation framework that utilizes SAM’s capability to interpret bounding boxes as prompts, both at train and test times, and finds that BoxCell significantly outperforms existing box supervised image segmentation models, obtaining 6-10 point Dice gains.

Abstract

Cell segmentation in histopathological images is vital for diagnosis, and treatment of several diseases. Annotating data is tedious, and requires medical expertise, making it difficult to employ supervised learning. Instead, we study a weakly supervised setting, where only bounding box supervision is available, and present the use of Segment Anything (SAM) for this without any finetuning, i.e., directly utilizing the pre-trained model. We propose BoxCell, a cell segmentation framework that utilizes SAM's capability to interpret bounding boxes as prompts, \emph{both} at train and test times. At train time, gold bounding boxes given to SAM produce (pseudo-)masks, which are used to train a standalone segmenter. At test time, BoxCell generates two segmentation masks: (1) generated by this standalone segmenter, and (2) a trained object detector outputs bounding boxes, which are given as prompts to SAM to produce another mask. Recognizing complementary strengths, we reconcile the two segmentation masks using a novel integer programming formulation with intensity and spatial constraints. We experiment on three publicly available cell segmentation datasets namely, CoNSep, MoNuSeg, and TNBC, and find that BoxCell significantly outperforms existing box supervised image segmentation models, obtaining 6-10 point Dice gains.

BoxCell: Leveraging SAM for Cell Segmentation with Box Supervision

TL;DR

This work proposes BoxCell, a cell segmentation framework that utilizes SAM’s capability to interpret bounding boxes as prompts, both at train and test times, and finds that BoxCell significantly outperforms existing box supervised image segmentation models, obtaining 6-10 point Dice gains.

Abstract

Cell segmentation in histopathological images is vital for diagnosis, and treatment of several diseases. Annotating data is tedious, and requires medical expertise, making it difficult to employ supervised learning. Instead, we study a weakly supervised setting, where only bounding box supervision is available, and present the use of Segment Anything (SAM) for this without any finetuning, i.e., directly utilizing the pre-trained model. We propose BoxCell, a cell segmentation framework that utilizes SAM's capability to interpret bounding boxes as prompts, \emph{both} at train and test times. At train time, gold bounding boxes given to SAM produce (pseudo-)masks, which are used to train a standalone segmenter. At test time, BoxCell generates two segmentation masks: (1) generated by this standalone segmenter, and (2) a trained object detector outputs bounding boxes, which are given as prompts to SAM to produce another mask. Recognizing complementary strengths, we reconcile the two segmentation masks using a novel integer programming formulation with intensity and spatial constraints. We experiment on three publicly available cell segmentation datasets namely, CoNSep, MoNuSeg, and TNBC, and find that BoxCell significantly outperforms existing box supervised image segmentation models, obtaining 6-10 point Dice gains.
Paper Structure (38 sections, 7 equations, 5 figures, 13 tables)

This paper contains 38 sections, 7 equations, 5 figures, 13 tables.

Figures (5)

  • Figure 1: Inference pipeline for BoxCell, which produces masks $M_{D}$ and $M_{S}$ using a ITD and ITS. These masks are split into a $K$$\times$$K$ grid, and GMMs are trained to estimate probability map ($P$). ILP solver refines $P$ based on intensity and spatial constraints. This figure was created using draw.io.
  • Figure 2: Workflows with SAM in weak supervision. ITD uses the detection model $D(\theta)$ to predict bounding boxes. The detection model is trained on the training data and is used to predict bounding boxes, which are used as box prompts for SAM during inference. ITS uses segmentation masks predicted by SAM as pseudo ground truth to train $S(\phi)$. We only call $S(\phi)$ during inference.
  • Figure 3: Qualitative analysis of segmentation masks. Column 1 is the original image, Columns 2-5 show cropped masks (shown in red box) generated from three comparison models and BoxCell. Last column is the ground truth. BoxCell exhibits best results, providing more accurate masks with better cell boundary and shape.
  • Figure 4: Original and stain variation images across datasets.
  • Figure 5: BoxCell with only ITD does not predict foreground outside box-prompts. BoxCell can do so, reducing the number of false negatives and improving the mask quality for wrongly sized boxes (A and B). BoxCell produces finer segmentation masks (C). BoxCell performs less effectively for images with low contrast in f/g and b/g (A row 2) where its capability to mitigate false positives is limited (B in row 2).