Table of Contents
Fetching ...

RP-SAM2: Refining Point Prompts for Stable Surgical Instrument Segmentation

Nuren Zhaksylyk, Ibrahim Almakky, Jay Paranjape, S. Swaroop Vedula, Shameema Sikder, Vishal M. Patel, Mohammad Yaqub

TL;DR

The paper addresses the instability of single-point prompts in SAM-based surgical instrument segmentation under limited annotated data. It introduces RP-SAM2, which incorporates a shift block to make the prompt image-aware and a compound loss to train the shift block using multiple candidate points, all while keeping the base SAM2 components frozen. On Cataract1k, RP-SAM2 achieves approximately a 2% gain in mDSC and a 21% reduction in mHD95 with lower variability, and it enables improved pseudo-mask quality for SAM2-FT on CaDIS with modest fine-tuning. The approach demonstrates practical benefits for semi-automatic labeling in medical imaging, with potential extensions to video segmentation and prompt-tracking for dynamic workflows.

Abstract

Accurate surgical instrument segmentation is essential in cataract surgery for tasks such as skill assessment and workflow optimization. However, limited annotated data makes it difficult to develop fully automatic models. Prompt-based methods like SAM2 offer flexibility yet remain highly sensitive to the point prompt placement, often leading to inconsistent segmentations. We address this issue by introducing RP-SAM2, which incorporates a novel shift block and a compound loss function to stabilize point prompts. Our approach reduces annotator reliance on precise point positioning while maintaining robust segmentation capabilities. Experiments on the Cataract1k dataset demonstrate that RP-SAM2 improves segmentation accuracy, with a 2% mDSC gain, a 21.36% reduction in mHD95, and decreased variance across random single-point prompt results compared to SAM2. Additionally, on the CaDIS dataset, pseudo masks generated by RP-SAM2 for fine-tuning SAM2's mask decoder outperformed those generated by SAM2. These results highlight RP-SAM2 as a practical, stable and reliable solution for semi-automatic instrument segmentation in data-constrained medical settings. The code is available at https://github.com/BioMedIA-MBZUAI/RP-SAM2.

RP-SAM2: Refining Point Prompts for Stable Surgical Instrument Segmentation

TL;DR

The paper addresses the instability of single-point prompts in SAM-based surgical instrument segmentation under limited annotated data. It introduces RP-SAM2, which incorporates a shift block to make the prompt image-aware and a compound loss to train the shift block using multiple candidate points, all while keeping the base SAM2 components frozen. On Cataract1k, RP-SAM2 achieves approximately a 2% gain in mDSC and a 21% reduction in mHD95 with lower variability, and it enables improved pseudo-mask quality for SAM2-FT on CaDIS with modest fine-tuning. The approach demonstrates practical benefits for semi-automatic labeling in medical imaging, with potential extensions to video segmentation and prompt-tracking for dynamic workflows.

Abstract

Accurate surgical instrument segmentation is essential in cataract surgery for tasks such as skill assessment and workflow optimization. However, limited annotated data makes it difficult to develop fully automatic models. Prompt-based methods like SAM2 offer flexibility yet remain highly sensitive to the point prompt placement, often leading to inconsistent segmentations. We address this issue by introducing RP-SAM2, which incorporates a novel shift block and a compound loss function to stabilize point prompts. Our approach reduces annotator reliance on precise point positioning while maintaining robust segmentation capabilities. Experiments on the Cataract1k dataset demonstrate that RP-SAM2 improves segmentation accuracy, with a 2% mDSC gain, a 21.36% reduction in mHD95, and decreased variance across random single-point prompt results compared to SAM2. Additionally, on the CaDIS dataset, pseudo masks generated by RP-SAM2 for fine-tuning SAM2's mask decoder outperformed those generated by SAM2. These results highlight RP-SAM2 as a practical, stable and reliable solution for semi-automatic instrument segmentation in data-constrained medical settings. The code is available at https://github.com/BioMedIA-MBZUAI/RP-SAM2.

Paper Structure

This paper contains 6 sections, 3 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Zero-shot performance of SAM2 on surgical instrument segmentation using single-point prompts. Rows represent instrument classes, with column (i) showing ground truth (GT) masks (blue shade). Columns (ii)-(vi) display single-point prompts (star symbol) and their segmentation masks, while column (v) presents a heatmap of dice scores for different point locations. Significant variance in results observed depending on prompt placement.
  • Figure 2: Proposed RP-SAM2 architecture. (a) A Shift Block with 12.1M trainable parameters is integrated into SAM2 (with 236.5M frozen parameters) to reposition the user's input point prompt via cross-attention with image embeddings. (b) The algorithm employs grid-based point sampling to compute dice scores at multiple locations on the object and selects candidate point prompt coordinates.
  • Figure 3: An illustration of (a) OOD performance on the CaDIS test set, evaluated over 10 single-point prompts, where the vertical axis shows the mean dice score across instruments (with shaded standard deviation) and the horizontal axis indicates the percentage of the CaDIS training set used to fine-tune the shift block (with the remainder used to generate pseudo masks for fine-tuning SAM2’s mask decoder, referred to as SAM2-FT), (b) ablation study of loss components and the number of points sampled per object during training RP-SAM2 on Cataract1k dataset.
  • Figure 4: An illustration of (a) Qualitative comparison with SOTA and (b) An example of strong light-reflection artefacts on the instrument. Columns represent SOTA methods, row (i) GT masks for given instrument, rows (ii)-(vi) different single-point clicks on the instrument and corresponding predicted masks and (v) heatmap of dice scores for different single-point prompts.