Table of Contents
Fetching ...

SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection

Huafeng Chen, Pengxu Wei, Guangqian Guo, Shan Gao

TL;DR

The paper tackles weakly-supervised camouflaged object detection (WSCOD) by leveraging the Segment Anything Model (SAM) in a unified framework called SAM-COD. It introduces a Prompt Adapter to render scribbles compatible with SAM, a Response Filter to suppress extreme outputs, a Semantic Matcher to align semantics with COD knowledge, and a Prompt-Adaptive Knowledge Distillation pipeline to transfer SAM’s knowledge to a lighter model. Empirical results on CAMO, COD10K, and NC4K show that SAM-COD achieves state-of-the-art performance among WSCOD methods and even surpasses some fully supervised baselines, with strong transfer to SOD. The approach demonstrates that combining SAM with targeted prompting and distillation yields robust, label-efficient COD and offers practical benefits for real-world detection tasks.

Abstract

Most Camouflaged Object Detection (COD) methods heavily rely on mask annotations, which are time-consuming and labor-intensive to acquire. Existing weakly-supervised COD approaches exhibit significantly inferior performance compared to fully-supervised methods and struggle to simultaneously support all the existing types of camouflaged object labels, including scribbles, bounding boxes, and points. Even for Segment Anything Model (SAM), it is still problematic to handle the weakly-supervised COD and it typically encounters challenges of prompt compatibility of the scribble labels, extreme response, semantically erroneous response, and unstable feature representations, producing unsatisfactory results in camouflaged scenes. To mitigate these issues, we propose a unified COD framework in this paper, termed SAM-COD, which is capable of supporting arbitrary weakly-supervised labels. Our SAM-COD employs a prompt adapter to handle scribbles as prompts based on SAM. Meanwhile, we introduce response filter and semantic matcher modules to improve the quality of the masks obtained by SAM under COD prompts. To alleviate the negative impacts of inaccurate mask predictions, a new strategy of prompt-adaptive knowledge distillation is utilized to ensure a reliable feature representation. To validate the effectiveness of our approach, we have conducted extensive empirical experiments on three mainstream COD benchmarks. The results demonstrate the superiority of our method against state-of-the-art weakly-supervised and even fully-supervised methods.

SAM-COD: SAM-guided Unified Framework for Weakly-Supervised Camouflaged Object Detection

TL;DR

The paper tackles weakly-supervised camouflaged object detection (WSCOD) by leveraging the Segment Anything Model (SAM) in a unified framework called SAM-COD. It introduces a Prompt Adapter to render scribbles compatible with SAM, a Response Filter to suppress extreme outputs, a Semantic Matcher to align semantics with COD knowledge, and a Prompt-Adaptive Knowledge Distillation pipeline to transfer SAM’s knowledge to a lighter model. Empirical results on CAMO, COD10K, and NC4K show that SAM-COD achieves state-of-the-art performance among WSCOD methods and even surpasses some fully supervised baselines, with strong transfer to SOD. The approach demonstrates that combining SAM with targeted prompting and distillation yields robust, label-efficient COD and offers practical benefits for real-world detection tasks.

Abstract

Most Camouflaged Object Detection (COD) methods heavily rely on mask annotations, which are time-consuming and labor-intensive to acquire. Existing weakly-supervised COD approaches exhibit significantly inferior performance compared to fully-supervised methods and struggle to simultaneously support all the existing types of camouflaged object labels, including scribbles, bounding boxes, and points. Even for Segment Anything Model (SAM), it is still problematic to handle the weakly-supervised COD and it typically encounters challenges of prompt compatibility of the scribble labels, extreme response, semantically erroneous response, and unstable feature representations, producing unsatisfactory results in camouflaged scenes. To mitigate these issues, we propose a unified COD framework in this paper, termed SAM-COD, which is capable of supporting arbitrary weakly-supervised labels. Our SAM-COD employs a prompt adapter to handle scribbles as prompts based on SAM. Meanwhile, we introduce response filter and semantic matcher modules to improve the quality of the masks obtained by SAM under COD prompts. To alleviate the negative impacts of inaccurate mask predictions, a new strategy of prompt-adaptive knowledge distillation is utilized to ensure a reliable feature representation. To validate the effectiveness of our approach, we have conducted extensive empirical experiments on three mainstream COD benchmarks. The results demonstrate the superiority of our method against state-of-the-art weakly-supervised and even fully-supervised methods.
Paper Structure (15 sections, 10 equations, 8 figures, 8 tables)

This paper contains 15 sections, 10 equations, 8 figures, 8 tables.

Figures (8)

  • Figure 1: Comparison of COD methods for different granularity labels. A larger circle denotes a higher-parameter model. SAM-COD is capable of handling three different labels for camouflaged objects. It achieves the highest performance under the weakly-supervised learning setting and even outperforms the fully supervised ZoomNet pang2022zoom.
  • Figure 2: Issues arising from SAM in COD, i.e., a) prompt compatibility of scribble: SAM does not support the scribble input. b) extreme response: SAM produces extensive background responses (rows 3, 4) and minimal object responses (rows 1, 2). c) semantically erroneous response: SAM produces erroneous responses to non-camouflaged objects (rows 3, 4) and object-biased fine-grained semantic responses (rows 1, 2). d) unstable feature representation: SAM produces varied outcomes (1, 2 rows vs. 3, 4 rows) in similar scenarios. The contours of camouflaged objects are highlighted in blue.
  • Figure 3: The architecture of the proposed SAM-COD framework. Prompt Adapter supports scribbles to adapt the input prompt of SAM. Response Filter handles the extreme responses of SAM. Semantic Matcher is utilized to solve SAM's response issues arising from a lack of semantics in COD. Prompt-Adaptive Knowledge Distillation is designed for knowledge distillation in WSCOD.
  • Figure 4: Density distribution map about $S_m$ and object size. Box and ellipse respectively represent challenging small and big objects, which have poor performance.
  • Figure 5: Visual comparison with some representative state-of-the-art fully-supervised and scribble-supervised models.
  • ...and 3 more figures