Table of Contents
Fetching ...

YCDa: YCbCr Decoupled Attention for Real-time Realistic Camouflaged Object Detection

PeiHuang Zheng, Yunlong Zhao, Zheng Cui, Yang Li

TL;DR

YCDa is an efficient early-stage feature processing strategy that embeds this chrominance-luminance decoupling and dynamic attention principle into modern real-time detectors and sets new state-of-the-art results for real-time camouflaged object detection across COD-D datasets.

Abstract

Human vision exhibits remarkable adaptability in perceiving objects under camouflage. When color cues become unreliable, the visual system instinctively shifts its reliance from chrominance (color) to luminance (brightness and texture), enabling more robust perception in visually confusing environments. Drawing inspiration from this biological mechanism, we propose YCDa, an efficient early-stage feature processing strategy that embeds this "chrominance-luminance decoupling and dynamic attention" principle into modern real-time detectors. Specifically, YCDa separates color and luminance information in the input stage and dynamically allocates attention across channels to amplify discriminative cues while suppressing misleading color noise. The strategy is plug-and-play and can be integrated into existing detectors by simply replacing the first downsampling layer. Extensive experiments on multiple baselines demonstrate that YCDa consistently improves performance with negligible overhead as shown in Fig. Notably, YCDa-YOLO12s achieves a 112% improvement in mAP over the baseline on COD10K-D and sets new state-of-the-art results for real-time camouflaged object detection across COD-D datasets.

YCDa: YCbCr Decoupled Attention for Real-time Realistic Camouflaged Object Detection

TL;DR

YCDa is an efficient early-stage feature processing strategy that embeds this chrominance-luminance decoupling and dynamic attention principle into modern real-time detectors and sets new state-of-the-art results for real-time camouflaged object detection across COD-D datasets.

Abstract

Human vision exhibits remarkable adaptability in perceiving objects under camouflage. When color cues become unreliable, the visual system instinctively shifts its reliance from chrominance (color) to luminance (brightness and texture), enabling more robust perception in visually confusing environments. Drawing inspiration from this biological mechanism, we propose YCDa, an efficient early-stage feature processing strategy that embeds this "chrominance-luminance decoupling and dynamic attention" principle into modern real-time detectors. Specifically, YCDa separates color and luminance information in the input stage and dynamically allocates attention across channels to amplify discriminative cues while suppressing misleading color noise. The strategy is plug-and-play and can be integrated into existing detectors by simply replacing the first downsampling layer. Extensive experiments on multiple baselines demonstrate that YCDa consistently improves performance with negligible overhead as shown in Fig. Notably, YCDa-YOLO12s achieves a 112% improvement in mAP over the baseline on COD10K-D and sets new state-of-the-art results for real-time camouflaged object detection across COD-D datasets.
Paper Structure (20 sections, 7 equations, 6 figures, 5 tables)

This paper contains 20 sections, 7 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Improvements over different baseline models. The results are reported on the COD10K-D test set in mean Average Precision (mAP).
  • Figure 2: Comparison of chrominance and luminance channels under different object saliency levels. Each group contains camouflaged objects (the first column) and salient objects (the second column) generated by modifying only color while preserving original features.
  • Figure 3: Overview of YCbCr Decoupled Attention strategy. The input image first undergoes color space transformation, then uses point-wise-free ESSamp for downsampling and preliminary feature processing, followed by the ICA module to allocate attention across different chrominance and luminance information channels. The top-left corner shows the YCDa-Enhanced Detection Network.
  • Figure 4: Comparison of saliency differences, GAP, and VAR across three YCbCr channels. The visualization demonstrates that variance effectively captures discriminative information value variations across different object saliency levels.
  • Figure 5: Architecture of the Information-aware Channel Attention (ICA) module. The module integrates both global average pooling and variance information to perceive channel-wise information differences, enabling more precise attention allocation.
  • ...and 1 more figures