Table of Contents
Fetching ...

GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection

Yanguang Sun, Hanyu Xuan, Jian Yang, Lei Luo

TL;DR

GLCONet tackles camouflaged object detection by explicitly modeling both global long-range dependencies and local spatial details. It introduces a Global-Local Collaborative Optimization Strategy (COS) comprising a Global Perception Module, Local Refinement Module, and Group-wise Hybrid Interaction Module, plus an Adjacent Reverse Decoder to integrate multi-level cues through cross-layer aggregation and reverse optimization. The approach achieves superior results across CAMO, COD10K, and NC4K datasets, with backbone-agnostic performance and strong qualitative segmentation in challenging scenes. The work also demonstrates expanded applicability to polyp segmentation, indicating broad generalization and practical impact in tasks requiring fine-grained camouflage-aware detection.

Abstract

Recently, biological perception has been a powerful tool for handling the camouflaged object detection (COD) task. However, most existing methods are heavily dependent on the local spatial information of diverse scales from convolutional operations to optimize initial features. A commonly neglected point in these methods is the long-range dependencies between feature pixels from different scale spaces that can help the model build a global structure of the object, inducing a more precise image representation. In this paper, we propose a novel Global-Local Collaborative Optimization Network, called GLCONet. Technically, we first design a collaborative optimization strategy from the perspective of multi-source perception to simultaneously model the local details and global long-range relationships, which can provide features with abundant discriminative information to boost the accuracy in detecting camouflaged objects. Furthermore, we introduce an adjacent reverse decoder that contains cross-layer aggregation and reverse optimization to integrate complementary information from different levels for generating high-quality representations. Extensive experiments demonstrate that the proposed GLCONet method with different backbones can effectively activate potentially significant pixels in an image, outperforming twenty state-of-the-art methods on three public COD datasets. The source code is available at: \https://github.com/CSYSI/GLCONet.

GLCONet: Learning Multi-source Perception Representation for Camouflaged Object Detection

TL;DR

GLCONet tackles camouflaged object detection by explicitly modeling both global long-range dependencies and local spatial details. It introduces a Global-Local Collaborative Optimization Strategy (COS) comprising a Global Perception Module, Local Refinement Module, and Group-wise Hybrid Interaction Module, plus an Adjacent Reverse Decoder to integrate multi-level cues through cross-layer aggregation and reverse optimization. The approach achieves superior results across CAMO, COD10K, and NC4K datasets, with backbone-agnostic performance and strong qualitative segmentation in challenging scenes. The work also demonstrates expanded applicability to polyp segmentation, indicating broad generalization and practical impact in tasks requiring fine-grained camouflage-aware detection.

Abstract

Recently, biological perception has been a powerful tool for handling the camouflaged object detection (COD) task. However, most existing methods are heavily dependent on the local spatial information of diverse scales from convolutional operations to optimize initial features. A commonly neglected point in these methods is the long-range dependencies between feature pixels from different scale spaces that can help the model build a global structure of the object, inducing a more precise image representation. In this paper, we propose a novel Global-Local Collaborative Optimization Network, called GLCONet. Technically, we first design a collaborative optimization strategy from the perspective of multi-source perception to simultaneously model the local details and global long-range relationships, which can provide features with abundant discriminative information to boost the accuracy in detecting camouflaged objects. Furthermore, we introduce an adjacent reverse decoder that contains cross-layer aggregation and reverse optimization to integrate complementary information from different levels for generating high-quality representations. Extensive experiments demonstrate that the proposed GLCONet method with different backbones can effectively activate potentially significant pixels in an image, outperforming twenty state-of-the-art methods on three public COD datasets. The source code is available at: \https://github.com/CSYSI/GLCONet.
Paper Structure (30 sections, 13 equations, 12 figures, 11 tables)

This paper contains 30 sections, 13 equations, 12 figures, 11 tables.

Figures (12)

  • Figure 1: Local and global perception visual comparisons. Visual results between our COS and existing well-designed convolutional modules. "Stage 2" - "Stage 5" denote multi-level features from different modules. "GT" presents the ground truth.
  • Figure 2: Overall architecture of our GLCONet method. We use ResNet-50/Swin Transformer/PVT as the encoder and propose a collaborative optimization strategy (COS) that contains a global perception module (GPM), a local refinement module (LRM) and a group-wise hybrid interaction module (GHIM) to simultaneously model long-range dependencies and local details. In addition, we design an adjacent reverse decoder (ARD) to integrate the complementary information with different layers through cross-layer aggregation and reverse optimization.
  • Figure 3: Details of the multi-scale transformer block (MTB) and the progressive convolution block (PCB).
  • Figure 4: Details of the group-wise hybrid interaction module.
  • Figure 5: Details of the adjacent reverse decoder.
  • ...and 7 more figures