VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning
Ziyang Luo, Nian Liu, Wangbo Zhao, Xuguang Yang, Dingwen Zhang, Deng-Ping Fan, Fahad Khan, Junwei Han
TL;DR
VSCode presents a generalist model for multimodal salient and camouflaged object detection by combining a foundation segmentation model with 2D prompts that separately encode domain and task peculiarities. The approach leverages a Swin-based VST backbone, domain-specific prompts inserted in the encoder, and task-specific prompts in both the encoder and decoder, augmented by a prompt discrimination loss to disentangle knowledge and improve generalization. Trained jointly on four SOD tasks and three COD tasks, it achieves state-of-the-art results across 26 datasets and demonstrates zero-shot generalization to unseen tasks by mixing prompts (e.g., RGB-D COD). This work highlights the efficiency and scalability of prompt-based generalist models for complex multimodal segmentation, with practical implications for reducing task-specific model proliferation. The availability of source code further enables adoption and extension to new multimodal detection scenarios.
Abstract
Salient object detection (SOD) and camouflaged object detection (COD) are related yet distinct binary mapping tasks. These tasks involve multiple modalities, sharing commonalities and unique cues. Existing research often employs intricate task-specific specialist models, potentially leading to redundancy and suboptimal results. We introduce VSCode, a generalist model with novel 2D prompt learning, to jointly address four SOD tasks and three COD tasks. We utilize VST as the foundation model and introduce 2D prompts within the encoder-decoder architecture to learn domain and task-specific knowledge on two separate dimensions. A prompt discrimination loss helps disentangle peculiarities to benefit model optimization. VSCode outperforms state-of-the-art methods across six tasks on 26 datasets and exhibits zero-shot generalization to unseen tasks by combining 2D prompts, such as RGB-D COD. Source code has been available at https://github.com/Sssssuperior/VSCode.
