Table of Contents
Fetching ...

Seamless Detection: Unifying Salient Object Detection and Camouflaged Object Detection

Yi Liu, Chengxin Li, Xiaohui Dong, Lei Li, Dingwen Zhang, Shoukun Xu, Jungong Han

TL;DR

This work introduces a task-agnostic framework to jointly detect salient and camouflaged objects by unifying SOD and COD through a Contrastive Distillation Paradigm (CDP). A lightweight Interval-layer Global Context (IGC) decoder and a shared encoder-decoder enable single-pass inference for both tasks, achieving real-time performance (~67 fps). The model supports both supervised training with ground-truth labels and unsupervised training with DINO-generated pseudo labels updated via a moving-average strategy, demonstrating competitive results in supervision and state-of-the-art performance in unsupervised settings. Extensive ablations confirm the contributions of CDP, background/foreground semantics, and pseudo-label updates, while maintaining efficiency suitable for real-world application.

Abstract

Achieving joint learning of Salient Object Detection (SOD) and Camouflaged Object Detection (COD) is extremely challenging due to their distinct object characteristics, i.e., saliency and camouflage. The only preliminary research treats them as two contradictory tasks, training models on large-scale labeled data alternately for each task and assessing them independently. However, such task-specific mechanisms fail to meet real-world demands for addressing unknown tasks effectively. To address this issue, in this paper, we pioneer a task-agnostic framework to unify SOD and COD. To this end, inspired by the agreeable nature of binary segmentation for SOD and COD, we propose a Contrastive Distillation Paradigm (CDP) to distil the foreground from the background, facilitating the identification of salient and camouflaged objects amidst their surroundings. To probe into the contribution of our CDP, we design a simple yet effective contextual decoder involving the interval-layer and global context, which achieves an inference speed of 67 fps. Besides the supervised setting, our CDP can be seamlessly integrated into unsupervised settings, eliminating the reliance on extensive human annotations. Experiments on public SOD and COD datasets demonstrate the superiority of our proposed framework in both supervised and unsupervised settings, compared with existing state-of-the-art approaches. Code is available on https://github.com/liuyi1989/Seamless-Detection.

Seamless Detection: Unifying Salient Object Detection and Camouflaged Object Detection

TL;DR

This work introduces a task-agnostic framework to jointly detect salient and camouflaged objects by unifying SOD and COD through a Contrastive Distillation Paradigm (CDP). A lightweight Interval-layer Global Context (IGC) decoder and a shared encoder-decoder enable single-pass inference for both tasks, achieving real-time performance (~67 fps). The model supports both supervised training with ground-truth labels and unsupervised training with DINO-generated pseudo labels updated via a moving-average strategy, demonstrating competitive results in supervision and state-of-the-art performance in unsupervised settings. Extensive ablations confirm the contributions of CDP, background/foreground semantics, and pseudo-label updates, while maintaining efficiency suitable for real-world application.

Abstract

Achieving joint learning of Salient Object Detection (SOD) and Camouflaged Object Detection (COD) is extremely challenging due to their distinct object characteristics, i.e., saliency and camouflage. The only preliminary research treats them as two contradictory tasks, training models on large-scale labeled data alternately for each task and assessing them independently. However, such task-specific mechanisms fail to meet real-world demands for addressing unknown tasks effectively. To address this issue, in this paper, we pioneer a task-agnostic framework to unify SOD and COD. To this end, inspired by the agreeable nature of binary segmentation for SOD and COD, we propose a Contrastive Distillation Paradigm (CDP) to distil the foreground from the background, facilitating the identification of salient and camouflaged objects amidst their surroundings. To probe into the contribution of our CDP, we design a simple yet effective contextual decoder involving the interval-layer and global context, which achieves an inference speed of 67 fps. Besides the supervised setting, our CDP can be seamlessly integrated into unsupervised settings, eliminating the reliance on extensive human annotations. Experiments on public SOD and COD datasets demonstrate the superiority of our proposed framework in both supervised and unsupervised settings, compared with existing state-of-the-art approaches. Code is available on https://github.com/liuyi1989/Seamless-Detection.

Paper Structure

This paper contains 34 sections, 16 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: An easy example for the co-existing saliency and camouflage scene. A chameleon is salient when it appear in a new scene. However, it will change its body appearance to conceal itself in the surroundings, which makes it camouflaged. For the event, it is not reasonable to detect the chameleon using the individual salient or camouflaged object detection model. Inspired by this observation, it is necessary to design a task-agnostic model unifying the abilities of saliency and camouflaged detection.
  • Figure 2: Motivation statement. The previous UJSC li2021uncertainty is task-specific, which must fed salient and camouflaged image into SOD network and COD network correspondingly, otherwise generating poor results. This challenge can well be solved by our task-agnostic framework.
  • Figure 3: Overview of the framework. $\textbf{E}_\ast$ is the last layer of different blocks in ResNet-50 he2016deep. ${\cal L}_D$ and ${\cal L}_{NEG}$ denote the training losses of Eq. (\ref{['Dloss']}) and Eq. (\ref{['loss: contr']}), respectively. Under the supervised setting, the foreground map inferred by IGC is supervised by ground truth. Besides, the foreground semantics and background semantics, generated by the decoder and encoder of IGC, respectively, are supervised using the contrastive loss within CDP. Under the unsupervised setting, the deep features of DINO caron2021emerging are parsed to generate the pseudo masks at the initial two epochs, which will be updated at each epoch. Note that, only IGC is run for inference at the test stage for both supervised and unsupervised settings.
  • Figure 4: Visualizations for pseudo masks update.
  • Figure 5: Visual comparison for SOD and COD in the supervised setting.
  • ...and 2 more figures