Table of Contents
Fetching ...

Towards Accurate Camouflaged Object Detection with Mixture Convolution and Interactive Fusion

Geng Chen, Xinrui Chen, Bo Dong, Mingchen Zhuge, Yongxiong Wang, Hongbo Bi, Jian Chen, Peng Wang, Yanning Zhang

TL;DR

This work targets camouflaged object detection (COD) by addressing two core challenges: the need for a large receptive field to capture rich context, and effective fusion to integrate multi-level features. The authors introduce MCIF-Net, which combines Dual-Branch Mixture Convolution (DMC) for context expansion with Multi-Level Interactive Fusion (MIF) for attentive feature fusion, achieving state-of-the-art results on COD benchmarks. Key contributions include the DMC module for receptive-field enlargement, the MIF module for interactive cross-level fusion, and extensive experiments plus ablations validating their effectiveness. The approach demonstrates strong generalization and transferability, including a successful extension to polyp segmentation, and offers a practical pathway toward robust COD in challenging natural scenes.

Abstract

Camouflaged object detection (COD), which aims to identify the objects that conceal themselves into the surroundings, has recently drawn increasing research efforts in the field of computer vision. In practice, the success of deep learning based COD is mainly determined by two key factors, including (i) A significantly large receptive field, which provides rich context information, and (ii) An effective fusion strategy, which aggregates the rich multi-level features for accurate COD. Motivated by these observations, in this paper, we propose a novel deep learning based COD approach, which integrates the large receptive field and effective feature fusion into a unified framework. Specifically, we first extract multi-level features from a backbone network. The resulting features are then fed to the proposed dual-branch mixture convolution modules, each of which utilizes multiple asymmetric convolutional layers and two dilated convolutional layers to extract rich context features from a large receptive field. Finally, we fuse the features using specially-designed multilevel interactive fusion modules, each of which employs an attention mechanism along with feature interaction for effective feature fusion. Our method detects camouflaged objects with an effective fusion strategy, which aggregates the rich context information from a large receptive field. All of these designs meet the requirements of COD well, allowing the accurate detection of camouflaged objects. Extensive experiments on widely-used benchmark datasets demonstrate that our method is capable of accurately detecting camouflaged objects and outperforms the state-of-the-art methods.

Towards Accurate Camouflaged Object Detection with Mixture Convolution and Interactive Fusion

TL;DR

This work targets camouflaged object detection (COD) by addressing two core challenges: the need for a large receptive field to capture rich context, and effective fusion to integrate multi-level features. The authors introduce MCIF-Net, which combines Dual-Branch Mixture Convolution (DMC) for context expansion with Multi-Level Interactive Fusion (MIF) for attentive feature fusion, achieving state-of-the-art results on COD benchmarks. Key contributions include the DMC module for receptive-field enlargement, the MIF module for interactive cross-level fusion, and extensive experiments plus ablations validating their effectiveness. The approach demonstrates strong generalization and transferability, including a successful extension to polyp segmentation, and offers a practical pathway toward robust COD in challenging natural scenes.

Abstract

Camouflaged object detection (COD), which aims to identify the objects that conceal themselves into the surroundings, has recently drawn increasing research efforts in the field of computer vision. In practice, the success of deep learning based COD is mainly determined by two key factors, including (i) A significantly large receptive field, which provides rich context information, and (ii) An effective fusion strategy, which aggregates the rich multi-level features for accurate COD. Motivated by these observations, in this paper, we propose a novel deep learning based COD approach, which integrates the large receptive field and effective feature fusion into a unified framework. Specifically, we first extract multi-level features from a backbone network. The resulting features are then fed to the proposed dual-branch mixture convolution modules, each of which utilizes multiple asymmetric convolutional layers and two dilated convolutional layers to extract rich context features from a large receptive field. Finally, we fuse the features using specially-designed multilevel interactive fusion modules, each of which employs an attention mechanism along with feature interaction for effective feature fusion. Our method detects camouflaged objects with an effective fusion strategy, which aggregates the rich context information from a large receptive field. All of these designs meet the requirements of COD well, allowing the accurate detection of camouflaged objects. Extensive experiments on widely-used benchmark datasets demonstrate that our method is capable of accurately detecting camouflaged objects and outperforms the state-of-the-art methods.

Paper Structure

This paper contains 21 sections, 20 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: An overview of our MCIF-Net. Based on the features from backbone, we first utilize multiple dual-branch mixture convolution modules to extract rich context features and then fuse the features with our multi-level interactive fusion modules for the accurate detection of camouflaged object.
  • Figure 2: Illustration of our dual-branch mixture convolution module. We enlarge the receptive filed using asymmetric convolutional and dilated convolutional layers.
  • Figure 3: Qualitative comparison of our MCIF-Net and the baseline methods.
  • Figure 4: The PR curves and F-measure curves of our MCIF-Net and seven state-of-the-art models on challenging COD datasets.
  • Figure 5: Visual results for the ablation studies of MCIF-Net. For clarity, we use "+DMC", "+MIF", "w/SE", and "w/RFB" to denote "Backbone+DMC", "Backbone+MIF", "Backbone+DMC+SE", and "Backbone+RFB+MIF", respectively.
  • ...and 2 more figures