Table of Contents
Fetching ...

Fast Camouflaged Object Detection via Edge-based Reversible Re-calibration Network

Ge-Peng Ji, Lei Zhu, Mingchen Zhuge, Keren Fu

TL;DR

This work tackles camouflaged object detection by combining edge-aware priors with a reversible calibration mechanism. The proposed ERRNet uses Selective Edge Aggregation to form a robust edge prior and a Reversible Re-calibration Unit to progressively refine predictions by fusing neighbour, global, edge, and semantic priors. Through co-supervised learning and multi-scale calibration, ERRNet achieves state-of-the-art performance on COD benchmarks while maintaining real-time inference speeds, and also demonstrates strong transfer to medical image segmentation tasks. The results suggest ERRNet as a general, efficient framework for detecting objects that are highly similar to their surroundings, with potential for future enhancement using additional cues and modalities.

Abstract

Camouflaged Object Detection (COD) aims to detect objects with similar patterns (e.g., texture, intensity, colour, etc) to their surroundings, and recently has attracted growing research interest. As camouflaged objects often present very ambiguous boundaries, how to determine object locations as well as their weak boundaries is challenging and also the key to this task. Inspired by the biological visual perception process when a human observer discovers camouflaged objects, this paper proposes a novel edge-based reversible re-calibration network called ERRNet. Our model is characterized by two innovative designs, namely Selective Edge Aggregation (SEA) and Reversible Re-calibration Unit (RRU), which aim to model the visual perception behaviour and achieve effective edge prior and cross-comparison between potential camouflaged regions and background. More importantly, RRU incorporates diverse priors with more comprehensive information comparing to existing COD models. Experimental results show that ERRNet outperforms existing cutting-edge baselines on three COD datasets and five medical image segmentation datasets. Especially, compared with the existing top-1 model SINet, ERRNet significantly improves the performance by $\sim$6% (mean E-measure) with notably high speed (79.3 FPS), showing that ERRNet could be a general and robust solution for the COD task.

Fast Camouflaged Object Detection via Edge-based Reversible Re-calibration Network

TL;DR

This work tackles camouflaged object detection by combining edge-aware priors with a reversible calibration mechanism. The proposed ERRNet uses Selective Edge Aggregation to form a robust edge prior and a Reversible Re-calibration Unit to progressively refine predictions by fusing neighbour, global, edge, and semantic priors. Through co-supervised learning and multi-scale calibration, ERRNet achieves state-of-the-art performance on COD benchmarks while maintaining real-time inference speeds, and also demonstrates strong transfer to medical image segmentation tasks. The results suggest ERRNet as a general, efficient framework for detecting objects that are highly similar to their surroundings, with potential for future enhancement using additional cues and modalities.

Abstract

Camouflaged Object Detection (COD) aims to detect objects with similar patterns (e.g., texture, intensity, colour, etc) to their surroundings, and recently has attracted growing research interest. As camouflaged objects often present very ambiguous boundaries, how to determine object locations as well as their weak boundaries is challenging and also the key to this task. Inspired by the biological visual perception process when a human observer discovers camouflaged objects, this paper proposes a novel edge-based reversible re-calibration network called ERRNet. Our model is characterized by two innovative designs, namely Selective Edge Aggregation (SEA) and Reversible Re-calibration Unit (RRU), which aim to model the visual perception behaviour and achieve effective edge prior and cross-comparison between potential camouflaged regions and background. More importantly, RRU incorporates diverse priors with more comprehensive information comparing to existing COD models. Experimental results show that ERRNet outperforms existing cutting-edge baselines on three COD datasets and five medical image segmentation datasets. Especially, compared with the existing top-1 model SINet, ERRNet significantly improves the performance by 6% (mean E-measure) with notably high speed (79.3 FPS), showing that ERRNet could be a general and robust solution for the COD task.

Paper Structure

This paper contains 15 sections, 8 equations, 7 figures, 5 tables, 1 algorithm.

Figures (7)

  • Figure 1: Inference Speed (i.e., FPS) v.s. mean E-measure ($E_\phi$) on COD10K fan2020Camouflage dataset. The proposed ERRNet achieves competitive performance and faster inference speed compared to the SOTA camouflaged object detection methods.
  • Figure 2: The overall pipeline of the proposed ERRNet that contains three main cooperative components, including Atrous Spatial Pyramid Pooling (ASPP) for initiating global prior, Selective Edge Aggregation (SEA) for generating edge prior, and Reversible Re-calibration Unit (RRU) for modulating and refining the NGES Priors in a cascaded manner. More details are described in $\S$\ref{['sec:problem_formulation']}.
  • Figure 3: Visualization of each component in the NEGS priors, i.e., edge prior in (c), global prior in (d), and neighbour prior in (e) & (f). Specifically, the re-calibration stage treats the intermediate outputs of the network as the prior cues to enhance the reliability and stability of the learning process, and thus, more accurate final prediction (g) is obtained. Note that since the semantic prior $E_i$ is directly borrowed from the features of the ResNet-50 backbone, it is not shown here.
  • Figure 4: The effectiveness of NGES priors in RRU. (b) "GT" and (c) "NGES Priors" means the ground truth and full model of ERRNet, respectively. We observe that: (d) "edge prior" promotes the fine-grained of ambiguous weak boundaries; (e) "global prior" helps locate potential camouflaged regions; and (f) "neighbour prior" enhances the stability of prediction when global prior is less satisfactory or unavoidable noise is introduced. Since none of the NGES prior is dispensable, they have their own importance.
  • Figure 5: Visual comparison of camouflaged object detection maps produced by different methods. (a) Input images, (b) GT, which stands for the ground truths, (c) camouflaged object detection maps produced by our method, (d) SINet fan2020Camouflage, (e) EGNet zhao2019EGNet, (f) HTC chen2019hybrid, (g) CPD wu2019cascaded, and (h) PFANet zhao2019pyramid.
  • ...and 2 more figures