GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection
Yanguang Sun, Hanyu Xuan, Jian Yang, Lei Luo
TL;DR
GLCONet tackles camouflaged object detection by explicitly modeling both global long-range dependencies and local spatial details. It introduces a Global-Local Collaborative Optimization Strategy (COS) comprising a Global Perception Module, Local Refinement Module, and Group-wise Hybrid Interaction Module, plus an Adjacent Reverse Decoder to integrate multi-level cues through cross-layer aggregation and reverse optimization. The approach achieves superior results across CAMO, COD10K, and NC4K datasets, with backbone-agnostic performance and strong qualitative segmentation in challenging scenes. The work also demonstrates expanded applicability to polyp segmentation, indicating broad generalization and practical impact in tasks requiring fine-grained camouflage-aware detection.
Abstract
Recently, biological perception has been a powerful tool for handling the camouflaged object detection (COD) task. However, most existing methods are heavily dependent on the local spatial information of diverse scales from convolutional operations to optimize initial features. A commonly neglected point in these methods is the long-range dependencies between feature pixels from different scale spaces that can help the model build a global structure of the object, inducing a more precise image representation. In this paper, we propose a novel Global-Local Collaborative Optimization Network, called GLCONet. Technically, we first design a collaborative optimization strategy from the perspective of multi-source perception to simultaneously model the local details and global long-range relationships, which can provide features with abundant discriminative information to boost the accuracy in detecting camouflaged objects. Furthermore, we introduce an adjacent reverse decoder that contains cross-layer aggregation and reverse optimization to integrate complementary information from different levels for generating high-quality representations. Extensive experiments demonstrate that the proposed GLCONet method with different backbones can effectively activate potentially significant pixels in an image, outperforming twenty state-of-the-art methods on three public COD datasets. The source code is available at: \https://github.com/CSYSI/GLCONet.
