Table of Contents
Fetching ...

PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts

Hao Li, Han Liu, Dewei Hu, Jiacheng Wang, Ipek Oguz

TL;DR

PRISM tackles robustness in 3D medical image segmentation by integrating interactive prompts of varying granularity with an iterative learning framework. It combines a hybrid CNN–Vision Transformer encoder, multi‑output segmentation with confidence scores, and a shallow corrective refinement network to progressively improve segmentation across iterations. On four public tumor datasets (colon, pancreas, liver, kidney), PRISM‑plain with minimal prompts and PRISM‑ultra with richer prompts substantially outperform state‑of‑the‑art automatic and interactive baselines, approaching human performance. The approach offers a practical, interactive segmentation solution for clinical settings, and the authors release the public code at https://github.com/MedICL-VU/PRISM.

Abstract

In this paper, we present PRISM, a Promptable and Robust Interactive Segmentation Model, aiming for precise segmentation of 3D medical images. PRISM accepts various visual inputs, including points, boxes, and scribbles as sparse prompts, as well as masks as dense prompts. Specifically, PRISM is designed with four principles to achieve robustness: (1) Iterative learning. The model produces segmentations by using visual prompts from previous iterations to achieve progressive improvement. (2) Confidence learning. PRISM employs multiple segmentation heads per input image, each generating a continuous map and a confidence score to optimize predictions. (3) Corrective learning. Following each segmentation iteration, PRISM employs a shallow corrective refinement network to reassign mislabeled voxels. (4) Hybrid design. PRISM integrates hybrid encoders to better capture both the local and global information. Comprehensive validation of PRISM is conducted using four public datasets for tumor segmentation in the colon, pancreas, liver, and kidney, highlighting challenges caused by anatomical variations and ambiguous boundaries in accurate tumor identification. Compared to state-of-the-art methods, both with and without prompt engineering, PRISM significantly improves performance, achieving results that are close to human levels. The code is publicly available at https://github.com/MedICL-VU/PRISM.

PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts

TL;DR

PRISM tackles robustness in 3D medical image segmentation by integrating interactive prompts of varying granularity with an iterative learning framework. It combines a hybrid CNN–Vision Transformer encoder, multi‑output segmentation with confidence scores, and a shallow corrective refinement network to progressively improve segmentation across iterations. On four public tumor datasets (colon, pancreas, liver, kidney), PRISM‑plain with minimal prompts and PRISM‑ultra with richer prompts substantially outperform state‑of‑the‑art automatic and interactive baselines, approaching human performance. The approach offers a practical, interactive segmentation solution for clinical settings, and the authors release the public code at https://github.com/MedICL-VU/PRISM.

Abstract

In this paper, we present PRISM, a Promptable and Robust Interactive Segmentation Model, aiming for precise segmentation of 3D medical images. PRISM accepts various visual inputs, including points, boxes, and scribbles as sparse prompts, as well as masks as dense prompts. Specifically, PRISM is designed with four principles to achieve robustness: (1) Iterative learning. The model produces segmentations by using visual prompts from previous iterations to achieve progressive improvement. (2) Confidence learning. PRISM employs multiple segmentation heads per input image, each generating a continuous map and a confidence score to optimize predictions. (3) Corrective learning. Following each segmentation iteration, PRISM employs a shallow corrective refinement network to reassign mislabeled voxels. (4) Hybrid design. PRISM integrates hybrid encoders to better capture both the local and global information. Comprehensive validation of PRISM is conducted using four public datasets for tumor segmentation in the colon, pancreas, liver, and kidney, highlighting challenges caused by anatomical variations and ambiguous boundaries in accurate tumor identification. Compared to state-of-the-art methods, both with and without prompt engineering, PRISM significantly improves performance, achieving results that are close to human levels. The code is publicly available at https://github.com/MedICL-VU/PRISM.
Paper Structure (4 sections, 6 figures, 3 tables)

This paper contains 4 sections, 6 figures, 3 tables.

Figures (6)

  • Figure 1: (a) PRISM takes an image ($x$) and visual prompts ($v$) to produce a segmentation ($y')$. The user then provides prompts for next iteration. (b) Interaction between image and visual features in the latent space to produce image ($Z_x$) and visual ($Z_v$) embeddings with self- and cross-attention mechanisms.
  • Figure 2: (a) Details of PRISM. Green highlights the corrective refinement network. (b) Top row shows the multi-mask prediction with labeled confidence scores. The selector would then pick Mask 3 as the dense prompt. Possible visual prompts given this dense prompt are shown in the bottom row.
  • Figure 3: Dice score of proposed PRISM on four tumor datasets, where the mean values (lines) and their 95% confidence intervals (shades) are presented.
  • Figure 4: Qualitative results of PRISM-ultra for colon tumors characterized by irregular shapes and ambiguous boundaries. The orange arrows indicate the major defects which are corrected in the subsequent iteration. The initial output has noticeable errors that rapidly get corrected in the first few iterations. More qualitative results can be viewed in Fig. \ref{['iter_supp']}.
  • Figure S.1: Qualitative results of four different tumor segmentation tasks. The orange arrows indicate the major defects.
  • ...and 1 more figures