Adaptive Guidance Learning for Camouflaged Object Detection
Zhennan Chen, Xuying Zhang, Tian-Zhu Xiang, Ying Tai
TL;DR
This work tackles camouflaged object detection (COD), where objects are visually similar to the background and cues are hard to exploit with RGB information alone. It introduces AGLNet, a unified end-to-end framework that adaptively learns and integrates auxiliary cues via an Additional Information Generation (AIG) module to produce a cue representation $\\mathcal{A} \\in \\mathbb{R}^{\\frac{H}{8} \\times \\frac{W}{8} \\times C}$ and a cue map $r^{s}$, then fuses these cues with multi-scale backbone features through a Hierarchical Feature Combination (HFC) and refines predictions with a Recalibration Decoder (RD). The approach supports multiple cues (boundary, texture, edge, frequency) and shows significant improvements over 20 state-of-the-art COD methods across COD10K, CAMO, and NC4K on metrics such as $S_{\\alpha}$, $F_{\\beta}^{\\omega}$, $F_{m}$, $E_{m}$, and $MAE$. This adaptive cue-guidance framework enables robust camouflaged object segmentation in diverse scenes and offers a flexible, extensible solution for COD applications.
Abstract
Camouflaged object detection (COD) aims to segment objects visually embedded in their surroundings, which is a very challenging task due to the high similarity between the objects and the background. To address it, most methods often incorporate additional information (e.g., boundary, texture, and frequency clues) to guide feature learning for better detecting camouflaged objects from the background. Although progress has been made, these methods are basically individually tailored to specific auxiliary cues, thus lacking adaptability and not consistently achieving high segmentation performance. To this end, this paper proposes an adaptive guidance learning network, dubbed \textit{AGLNet}, which is a unified end-to-end learnable model for exploring and adapting different additional cues in CNN models to guide accurate camouflaged feature learning. Specifically, we first design a straightforward additional information generation (AIG) module to learn additional camouflaged object cues, which can be adapted for the exploration of effective camouflaged features. Then we present a hierarchical feature combination (HFC) module to deeply integrate additional cues and image features to guide camouflaged feature learning in a multi-level fusion manner.Followed by a recalibration decoder (RD), different features are further aggregated and refined for accurate object prediction. Extensive experiments on three widely used COD benchmark datasets demonstrate that the proposed method achieves significant performance improvements under different additional cues, and outperforms the recent 20 state-of-the-art methods by a large margin. Our code will be made publicly available at: \textcolor{blue}{https://github.com/ZNan-Chen/AGLNet}.
