Table of Contents
Fetching ...

SPEGNet: Synergistic Perception-Guided Network for Camouflaged Object Detection

Baber Jan, Saeed Anwar, Aiman H. El-Maleh, Abdul Jabbar Siddiqui, Abdul Bais

TL;DR

Camouflaged object detection (COD) is hindered by intrinsic similarity and edge disruption, which traditional architectures address via accumulating modules and reduced-resolution processing. SPEGNet introduces a synergistic design with Contextual Feature Integration, Edge Feature Extraction, and Progressive Edge-guided Decoder to fuse multi-scale context and boundary information in a single, cohesive framework. It delivers state-of-the-art performance on COD10K, NC4K, and CAMO with real-time inference, and demonstrates strong cross-domain transfer to medical imaging and agriculture without architectural changes. The work highlights a shift from modular accumulation toward principled integration of perception mechanisms, and discusses remaining challenges such as resolution-dependent detection boundaries and annotation quality in COD benchmarks.

Abstract

Camouflaged object detection segments objects with intrinsic similarity and edge disruption. Current detection methods rely on accumulated complex components. Each approach adds components such as boundary modules, attention mechanisms, and multi-scale processors independently. This accumulation creates a computational burden without proportional gains. To manage this complexity, they process at reduced resolutions, eliminating fine details essential for camouflage. We present SPEGNet, addressing fragmentation through a unified design. The architecture integrates multi-scale features via channel calibration and spatial enhancement. Boundaries emerge directly from context-rich representations, maintaining semantic-spatial alignment. Progressive refinement implements scale-adaptive edge modulation with peak influence at intermediate resolutions. This design strikes a balance between boundary precision and regional consistency. SPEGNet achieves 0.887 $S_α$ on CAMO, 0.890 on COD10K, and 0.895 on NC4K, with real-time inference speed. Our approach excels across scales, from tiny, intricate objects to large, pattern-similar ones, while handling occlusion and ambiguous boundaries. Code, model weights, and results are available on \href{https://github.com/Baber-Jan/SPEGNet}{https://github.com/Baber-Jan/SPEGNet}.

SPEGNet: Synergistic Perception-Guided Network for Camouflaged Object Detection

TL;DR

Camouflaged object detection (COD) is hindered by intrinsic similarity and edge disruption, which traditional architectures address via accumulating modules and reduced-resolution processing. SPEGNet introduces a synergistic design with Contextual Feature Integration, Edge Feature Extraction, and Progressive Edge-guided Decoder to fuse multi-scale context and boundary information in a single, cohesive framework. It delivers state-of-the-art performance on COD10K, NC4K, and CAMO with real-time inference, and demonstrates strong cross-domain transfer to medical imaging and agriculture without architectural changes. The work highlights a shift from modular accumulation toward principled integration of perception mechanisms, and discusses remaining challenges such as resolution-dependent detection boundaries and annotation quality in COD benchmarks.

Abstract

Camouflaged object detection segments objects with intrinsic similarity and edge disruption. Current detection methods rely on accumulated complex components. Each approach adds components such as boundary modules, attention mechanisms, and multi-scale processors independently. This accumulation creates a computational burden without proportional gains. To manage this complexity, they process at reduced resolutions, eliminating fine details essential for camouflage. We present SPEGNet, addressing fragmentation through a unified design. The architecture integrates multi-scale features via channel calibration and spatial enhancement. Boundaries emerge directly from context-rich representations, maintaining semantic-spatial alignment. Progressive refinement implements scale-adaptive edge modulation with peak influence at intermediate resolutions. This design strikes a balance between boundary precision and regional consistency. SPEGNet achieves 0.887 on CAMO, 0.890 on COD10K, and 0.895 on NC4K, with real-time inference speed. Our approach excels across scales, from tiny, intricate objects to large, pattern-similar ones, while handling occlusion and ambiguous boundaries. Code, model weights, and results are available on \href{https://github.com/Baber-Jan/SPEGNet}{https://github.com/Baber-Jan/SPEGNet}.

Paper Structure

This paper contains 51 sections, 1 equation, 10 figures, 5 tables.

Figures (10)

  • Figure 1: SPEGNet's effectiveness across diverse camouflage challenges. Columns show: (a) Original images, (b-d) Predictions from FEDER he2023feder, FSPNet huang2023fspnet and SPEGNet (Ours), (e) Ground truth. Rows show: (i) Intrinsic Similarity (IS)—white bird in snow, (ii) Edge Disruption (ED)—grasshopper with ambiguous boundaries, (iii-iv) Combined IS+ED with pattern similarity and intricate boundaries.
  • Figure 2: Architecture overview of SPEGNet. The figure illustrates the data flow through four key components: Feature Encoding (gray), Contextual Feature Integration (blue), combining multi-scale features, Edge Feature Extraction (green), deriving boundary information, and Progressive Edge-guided Decoder (yellow), which generates multi-scale predictions with scale-adaptive edge modulation. Sample input and corresponding segmentation outputs at different refinement stages are shown at the top of the figure.
  • Figure 3: Qualitative comparison on challenging COD scenarios. Columns show: (a) input image, (b) ground truth, (c-h) predictions from OCENet 9706783, BGNet sun2022bgnet, ZoomNet pang2022zoom, SINetV2 fan2021concealed, FSPNet huang2023fspnet, FEDER he2023feder, and (i) SPEGNet (Ours). Rows demonstrate: (i) small object detection, (ii) large object with pattern similarity, (iii) multiple instances revealing annotation limitations, (iv) occlusion handling, and (v) ambiguous boundaries. SPEGNet consistently outperforms existing methods across all scenarios.
  • Figure 4: Resolution impact across complexity levels. Rows show: (i) contextual complexity, (ii) texture integration, and (iii) perceptual limits. Columns: (a) input, (b-g) competing methods, (h-j) SPEGNet at different resolutions, (k) ground truth.
  • Figure 5: Visual ablation analysis on challenging examples. Columns: (a) input, (b) ground truth, (c) w/o Channel Attention, (d) w/o Edge Guidance, (e) w/ ViT, (f) w/o ASPP, (g) Single-stage Decoder, (h) SPEGNet (Full). Each variant shows specific failure modes, validating component contributions.
  • ...and 5 more figures