Visual Prompt Selection for In-Context Learning Segmentation
Wei Suo, Lanqing Lai, Mengyang Sun, Hanwang Zhang, Peng Wang, Yanning Zhang
TL;DR
This work analyzes how visual prompts shape performance in in-context learning for segmentation, revealing that prompt diversity often surpasses similarity-based prompts in guiding accurate masks. It introduces Stepwise Context Search (SCS), which builds a compact yet diverse candidate pool from unlabeled data via clustering and selects well-matched demonstrations with an adaptive search module guided by IoU rewards. Empirical results across COCO-20^i, PASCAL-5^i, and iSALD-5^i show that SCS consistently improves segmentation performance and can outperform existing prompt-selection strategies, achieving near state-of-the-art results in several settings. The approach also reduces annotation costs and is compatible as a plug-in enhancement for existing ICL-based segmentation models like SegGPT.
Abstract
As a fundamental and extensively studied task in computer vision, image segmentation aims to locate and identify different semantic concepts at the pixel level. Recently, inspired by In-Context Learning (ICL), several generalist segmentation frameworks have been proposed, providing a promising paradigm for segmenting specific objects. However, existing works mostly ignore the value of visual prompts or simply apply similarity sorting to select contextual examples. In this paper, we focus on rethinking and improving the example selection strategy. By comprehensive comparisons, we first demonstrate that ICL-based segmentation models are sensitive to different contexts. Furthermore, empirical evidence indicates that the diversity of contextual prompts plays a crucial role in guiding segmentation. Based on the above insights, we propose a new stepwise context search method. Different from previous works, we construct a small yet rich candidate pool and adaptively search the well-matched contexts. More importantly, this method effectively reduces the annotation cost by compacting the search space. Extensive experiments show that our method is an effective strategy for selecting examples and enhancing segmentation performance.
