Toward Realistic Camouflaged Object Detection: Benchmarks and Method

Zhimeng Xin; Tianxu Wu; Shiming Chen; Shuo Ye; Zijing Xie; Yixiong Zou; Xinge You; Yufei Guo

Toward Realistic Camouflaged Object Detection: Benchmarks and Method

Zhimeng Xin, Tianxu Wu, Shiming Chen, Shuo Ye, Zijing Xie, Yixiong Zou, Xinge You, Yufei Guo

TL;DR

This work targets Realistic Camouflaged Object Detection (RCOD) by reframing the problem from segmentation to bounding-box detection and introducing a camouflage-aware refinement framework (CAFR). CAFR combines Adaptive Gradient Propagation (AGP), which restricts and guides gradient updates across all feature-extractor layers, with Sparse Feature Refinement (SFR), which creates multi-scale pseudo-regions to emphasize sparse, class-specific cues in camouflaged contexts. The authors also contribute three new RCOD benchmarks (COD10K-D, NC4K-D, CAMO-D) with bounding boxes and class labels to facilitate detection-focused evaluation. Experiments show that CAFR consistently improves mAP, AP50, and AP75 across large detectors (e.g., GLIP, Grounding DINO) on the proposed datasets, outperforming baseline detection and segmentation approaches. The work demonstrates the practical impact of task-tailored fine-tuning strategies for RCOD and provides valuable resources for future research.

Abstract

Camouflaged object detection (COD) primarily relies on semantic or instance segmentation methods. While these methods have made significant advancements in identifying the contours of camouflaged objects, they may be inefficient or cost-effective for tasks that only require the specific location of the object. Object detection algorithms offer an optimized solution for Realistic Camouflaged Object Detection (RCOD) in such cases. However, detecting camouflaged objects remains a formidable challenge due to the high degree of similarity between the features of the objects and their backgrounds. Unlike segmentation methods that perform pixel-wise comparisons to differentiate between foreground and background, object detectors omit this analysis, further aggravating the challenge. To solve this problem, we propose a camouflage-aware feature refinement (CAFR) strategy. Since camouflaged objects are not rare categories, CAFR fully utilizes a clear perception of the current object within the prior knowledge of large models to assist detectors in deeply understanding the distinctions between background and foreground. Specifically, in CAFR, we introduce the Adaptive Gradient Propagation (AGP) module that fine-tunes all feature extractor layers in large detection models to fully refine class-specific features from camouflaged contexts. We then design the Sparse Feature Refinement (SFR) module that optimizes the transformer-based feature extractor to focus primarily on capturing class-specific features in camouflaged scenarios. To facilitate the assessment of RCOD tasks, we manually annotate the labels required for detection on three existing segmentation COD datasets, creating a new benchmark for RCOD tasks. Code and datasets are available at: https://github.com/zhimengXin/RCOD.

Toward Realistic Camouflaged Object Detection: Benchmarks and Method

TL;DR

Abstract

Paper Structure (23 sections, 5 equations, 7 figures, 7 tables)

This paper contains 23 sections, 5 equations, 7 figures, 7 tables.

Introduction
Related Work
Camouflaged Object Detection
Object Detection
Fine-tuning Large Models
Methodology
Task Definition
Why need Camouflage-Aware Feature Refinement in RCOD?
Adaptive Gradient Propagation Module
Sparse Feature Refinement Module
Online Implementation of SFR
Offline Implementation of SFR
Novel Datasets for Benchmark
Experiments on RCOD Detection
Experimental Settings
...and 8 more sections

Figures (7)

Figure 1: Comparison of (a) existing COD and (b) our RCOD tasks and (c) visualization of the challenge in the RCOD task. As for RCOD prediction, we use the trained GLIP model glipv2 on our proposed dataset to visualize the detection results. Since the background and foreground are extremely similar, the direct application of large models pre-trained on mostly well-defined object contour scenes still leads to misidentification of camouflaged objects (c). Furthermore, the proposed datasets contain bounding boxes that encompass sparse category features, e.g., the bounding boxes for the classes Pipefish and Katydid contain less than half of the class-specific features (b). This situation further reduces GLIP's ability to detect these classes (c), because it employs the swin-transformer swin as the feature extractor. This extractor focuses on the similarity relationships between pairs of patches, which can lead to confusion over camouflaged object features when assessing the similarities between blue-boxed and white-boxed patches (b).
Figure 2: The architecture of our proposed CAFR approach with SFR and AGP modules. Concerning the SFR module, within an input batch, each box is cropped to $W=200$ and $H=200$ from the input samples and randomly positioned on a square canvas. It is assumed that there are 16 boxes in a batch for showcasing the SFR module. In addition, $\times 1$ denotes obtaining 1 new sample, $\times 2$ denotes obtaining 2 new samples, and $\times 4$ denotes obtaining 4 new samples. As for the AGP module, it confines the backward pass of the detection head to the neck and backbone phases.
Figure 3: SFR offline setting. In the SFR module, each box is cropped to $W=200$ and $H=200$ from the selected 16 bounding boxes and randomly positioned on a square canvas.
Figure 4: Comparison of annotations: existing vs. our annotations. Introducing these three new annotations establishes valuable benchmarks in the field of RCOD research.
Figure 5: Preformance of AGP in various parameter setting.
...and 2 more figures

Toward Realistic Camouflaged Object Detection: Benchmarks and Method

TL;DR

Abstract

Toward Realistic Camouflaged Object Detection: Benchmarks and Method

Authors

TL;DR

Abstract

Table of Contents

Figures (7)