Taming SAM3 in the Wild: A Concept Bank for Open-Vocabulary Segmentation
Gensheng Pei, Xiruo Jiang, Yazhou Yao, Xiangbo Shu, Fumin Shen, Byeungwoo Jeon
TL;DR
This work tackles prompt-induced failures of open-vocabulary segmentation under distribution drift by introducing ConceptBank, a parameter-free calibration framework that builds a dataset-specific concept bank from target support data. ConceptBank operates in three stages—prototype anchoring, representative support mining, and prototype-consistent concept fusion—to produce target-calibrated embeddings for each class while keeping SAM3 frozen, enabling efficient, plug-in adaptation at inference time. Across natural-scene and remote-sensing benchmarks, ConceptBank yields robust gains over vanilla SAM3 and competitive baselines, achieving averages of 67.1 mIoU (natural-scene) and 52.1 mIoU (remote sensing) and confirming the value of data-centric prompt calibration for drift robustness. The approach offers a practical, gradient-free pathway to deploy open-vocabulary segmentation in varied domains, with potential to generalize to other multi-modal foundation models.
Abstract
The recent introduction of \texttt{SAM3} has revolutionized Open-Vocabulary Segmentation (OVS) through \textit{promptable concept segmentation}, which grounds pixel predictions in flexible concept prompts. However, this reliance on pre-defined concepts makes the model vulnerable: when visual distributions shift (\textit{data drift}) or conditional label distributions evolve (\textit{concept drift}) in the target domain, the alignment between visual evidence and prompts breaks down. In this work, we present \textsc{ConceptBank}, a parameter-free calibration framework to restore this alignment on the fly. Instead of adhering to static prompts, we construct a dataset-specific concept bank from the target statistics. Our approach (\textit{i}) anchors target-domain evidence via class-wise visual prototypes, (\textit{ii}) mines representative supports to suppress outliers under data drift, and (\textit{iii}) fuses candidate concepts to rectify concept drift. We demonstrate that \textsc{ConceptBank} effectively adapts \texttt{SAM3} to distribution drifts, including challenging natural-scene and remote-sensing scenarios, establishing a new baseline for robustness and efficiency in OVS. Code and model are available at https://github.com/pgsmall/ConceptBank.
