Depth Awakens: A Depth-perceptual Attention Fusion Network for RGB-D Camouflaged Object Detection
Xinran Liua, Lin Qia, Yuxuan Songa, Qi Wen
TL;DR
This work tackles camouflaged object detection by leveraging depth maps as direct inputs to reveal 3D cues absent in 2D RGB images. It introduces DAF-Net, featuring a three-branch encoder, a Depth-weighted Cross-attention Fusion module, and a lightweight Feature Aggregation Decoder to adaptively fuse RGB and depth information. Through MiDaS-based depth estimation and extensive experiments on CAMO, COD10K, and NC4K, the approach achieves state-of-the-art COD performance and demonstrates the value of depth cues in challenging camouflage scenarios. The study bridges single-image depth estimation (SIDE) and COD, showing depth information can be effectively exploited despite potential noise, and opens avenues for broader multimodal fusion in COD and related tasks.
Abstract
Camouflaged object detection (COD) presents a persistent challenge in accurately identifying objects that seamlessly blend into their surroundings. However, most existing COD models overlook the fact that visual systems operate within a genuine 3D environment. The scene depth inherent in a single 2D image provides rich spatial clues that can assist in the detection of camouflaged objects. Therefore, we propose a novel depth-perception attention fusion network that leverages the depth map as an auxiliary input to enhance the network's ability to perceive 3D information, which is typically challenging for the human eye to discern from 2D images. The network uses a trident-branch encoder to extract chromatic and depth information and their communications. Recognizing that certain regions of a depth map may not effectively highlight the camouflaged object, we introduce a depth-weighted cross-attention fusion module to dynamically adjust the fusion weights on depth and RGB feature maps. To keep the model simple without compromising effectiveness, we design a straightforward feature aggregation decoder that adaptively fuses the enhanced aggregated features. Experiments demonstrate the significant superiority of our proposed method over other states of the arts, which further validates the contribution of depth information in camouflaged object detection. The code will be available at https://github.com/xinran-liu00/DAF-Net.
