SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection
Huafeng Chen, Pengxu Wei, Guangqian Guo, Shan Gao
TL;DR
The paper tackles weakly-supervised camouflaged object detection (WSCOD) by leveraging the Segment Anything Model (SAM) in a unified framework called SAM-COD. It introduces a Prompt Adapter to render scribbles compatible with SAM, a Response Filter to suppress extreme outputs, a Semantic Matcher to align semantics with COD knowledge, and a Prompt-Adaptive Knowledge Distillation pipeline to transfer SAM’s knowledge to a lighter model. Empirical results on CAMO, COD10K, and NC4K show that SAM-COD achieves state-of-the-art performance among WSCOD methods and even surpasses some fully supervised baselines, with strong transfer to SOD. The approach demonstrates that combining SAM with targeted prompting and distillation yields robust, label-efficient COD and offers practical benefits for real-world detection tasks.
Abstract
Most Camouflaged Object Detection (COD) methods heavily rely on mask annotations, which are time-consuming and labor-intensive to acquire. Existing weakly-supervised COD approaches exhibit significantly inferior performance compared to fully-supervised methods and struggle to simultaneously support all the existing types of camouflaged object labels, including scribbles, bounding boxes, and points. Even for Segment Anything Model (SAM), it is still problematic to handle the weakly-supervised COD and it typically encounters challenges of prompt compatibility of the scribble labels, extreme response, semantically erroneous response, and unstable feature representations, producing unsatisfactory results in camouflaged scenes. To mitigate these issues, we propose a unified COD framework in this paper, termed SAM-COD, which is capable of supporting arbitrary weakly-supervised labels. Our SAM-COD employs a prompt adapter to handle scribbles as prompts based on SAM. Meanwhile, we introduce response filter and semantic matcher modules to improve the quality of the masks obtained by SAM under COD prompts. To alleviate the negative impacts of inaccurate mask predictions, a new strategy of prompt-adaptive knowledge distillation is utilized to ensure a reliable feature representation. To validate the effectiveness of our approach, we have conducted extensive empirical experiments on three mainstream COD benchmarks. The results demonstrate the superiority of our method against state-of-the-art weakly-supervised and even fully-supervised methods.
