Table of Contents
Fetching ...

Adaptive Guidance Learning for Camouflaged Object Detection

Zhennan Chen, Xuying Zhang, Tian-Zhu Xiang, Ying Tai

TL;DR

This work tackles camouflaged object detection (COD), where objects are visually similar to the background and cues are hard to exploit with RGB information alone. It introduces AGLNet, a unified end-to-end framework that adaptively learns and integrates auxiliary cues via an Additional Information Generation (AIG) module to produce a cue representation $\\mathcal{A} \\in \\mathbb{R}^{\\frac{H}{8} \\times \\frac{W}{8} \\times C}$ and a cue map $r^{s}$, then fuses these cues with multi-scale backbone features through a Hierarchical Feature Combination (HFC) and refines predictions with a Recalibration Decoder (RD). The approach supports multiple cues (boundary, texture, edge, frequency) and shows significant improvements over 20 state-of-the-art COD methods across COD10K, CAMO, and NC4K on metrics such as $S_{\\alpha}$, $F_{\\beta}^{\\omega}$, $F_{m}$, $E_{m}$, and $MAE$. This adaptive cue-guidance framework enables robust camouflaged object segmentation in diverse scenes and offers a flexible, extensible solution for COD applications.

Abstract

Camouflaged object detection (COD) aims to segment objects visually embedded in their surroundings, which is a very challenging task due to the high similarity between the objects and the background. To address it, most methods often incorporate additional information (e.g., boundary, texture, and frequency clues) to guide feature learning for better detecting camouflaged objects from the background. Although progress has been made, these methods are basically individually tailored to specific auxiliary cues, thus lacking adaptability and not consistently achieving high segmentation performance. To this end, this paper proposes an adaptive guidance learning network, dubbed \textit{AGLNet}, which is a unified end-to-end learnable model for exploring and adapting different additional cues in CNN models to guide accurate camouflaged feature learning. Specifically, we first design a straightforward additional information generation (AIG) module to learn additional camouflaged object cues, which can be adapted for the exploration of effective camouflaged features. Then we present a hierarchical feature combination (HFC) module to deeply integrate additional cues and image features to guide camouflaged feature learning in a multi-level fusion manner.Followed by a recalibration decoder (RD), different features are further aggregated and refined for accurate object prediction. Extensive experiments on three widely used COD benchmark datasets demonstrate that the proposed method achieves significant performance improvements under different additional cues, and outperforms the recent 20 state-of-the-art methods by a large margin. Our code will be made publicly available at: \textcolor{blue}{https://github.com/ZNan-Chen/AGLNet}.

Adaptive Guidance Learning for Camouflaged Object Detection

TL;DR

This work tackles camouflaged object detection (COD), where objects are visually similar to the background and cues are hard to exploit with RGB information alone. It introduces AGLNet, a unified end-to-end framework that adaptively learns and integrates auxiliary cues via an Additional Information Generation (AIG) module to produce a cue representation and a cue map , then fuses these cues with multi-scale backbone features through a Hierarchical Feature Combination (HFC) and refines predictions with a Recalibration Decoder (RD). The approach supports multiple cues (boundary, texture, edge, frequency) and shows significant improvements over 20 state-of-the-art COD methods across COD10K, CAMO, and NC4K on metrics such as , , , , and . This adaptive cue-guidance framework enables robust camouflaged object segmentation in diverse scenes and offers a flexible, extensible solution for COD applications.

Abstract

Camouflaged object detection (COD) aims to segment objects visually embedded in their surroundings, which is a very challenging task due to the high similarity between the objects and the background. To address it, most methods often incorporate additional information (e.g., boundary, texture, and frequency clues) to guide feature learning for better detecting camouflaged objects from the background. Although progress has been made, these methods are basically individually tailored to specific auxiliary cues, thus lacking adaptability and not consistently achieving high segmentation performance. To this end, this paper proposes an adaptive guidance learning network, dubbed \textit{AGLNet}, which is a unified end-to-end learnable model for exploring and adapting different additional cues in CNN models to guide accurate camouflaged feature learning. Specifically, we first design a straightforward additional information generation (AIG) module to learn additional camouflaged object cues, which can be adapted for the exploration of effective camouflaged features. Then we present a hierarchical feature combination (HFC) module to deeply integrate additional cues and image features to guide camouflaged feature learning in a multi-level fusion manner.Followed by a recalibration decoder (RD), different features are further aggregated and refined for accurate object prediction. Extensive experiments on three widely used COD benchmark datasets demonstrate that the proposed method achieves significant performance improvements under different additional cues, and outperforms the recent 20 state-of-the-art methods by a large margin. Our code will be made publicly available at: \textcolor{blue}{https://github.com/ZNan-Chen/AGLNet}.
Paper Structure (14 sections, 5 equations, 10 figures, 5 tables)

This paper contains 14 sections, 5 equations, 10 figures, 5 tables.

Figures (10)

  • Figure 1: Visual comparisons between FDCOD zhong2022detecting and DGNet ji2022gradient with different additional cues. We show the feature maps from the previous layer of the network outputs to better visualize the model performance. From (a), we can see that FDCOD well involves frequency domain clues for camouflaged object detection, but is not applicable to boundary cues. From (b), DGNet fails to identify the camouflaged objects with frequency domain clues due to the weak feature changes around objects in the frequency domain.
  • Figure 2: Overall architecture of our adaptive guidance learning network (AGLNet) for COD. The input image is processed by a visual backbone and an additional information generation (AIG) module to extract multi-scale image features and learn additional cues, respectively. Both sets of features are deeply integrated to guide the learning of camouflaged features in the hierarchical feature combination (HFC) module, which consists of combination and decoupling. Finally, the fused features are iteratively aggregated and refined with backbone and additional features by the recalibration decoder (RD) for object prediction.
  • Figure 3: Qualitative comparison of our proposed method and other representative COD methods. Our method provides better performance than all competitors for camouflaged object segmentation in various complex scenes.
  • Figure 4: Visual comparison of the proposed Combination part. (a) input image, (b) ground-truth, (c) baseline, and (d) baseline + Combination.
  • Figure 5: Visual comparison of the Decoupling part. (a) input image, (b) ground-truth, (c) baseline+Combination, (d) baseline+Combination+Decoupling. Red circles shows improvements.
  • ...and 5 more figures