Table of Contents
Fetching ...

Learning Camouflaged Object Detection from Noisy Pseudo Label

Jin Zhang, Ruiheng Zhang, Yanjiao Shi, Zhe Cao, Nian Liu, Fahad Shahbaz Khan

TL;DR

This work addresses the high labeling cost of camouflaged object detection by introducing Weakly Semi-Supervised COD (WSSCOD), which uses box prompts to generate high-quality pseudo labels and a minimal set of pixel-level annotations. A dual-network approach is employed: ANet (box+image branches) produces pseudo labels, while PNet (image-only) learns from real labels and these pseudo labels, guided by the Noise Correction Loss $L_{NC}$ to balance learning in early and memorization phases. The key contributions include $L_{NC}$, a robust loss that mitigates noisy pixels in pseudo labels, and evidence that using only 20% of fully labeled data plus box prompts can achieve performance comparable to fully supervised methods, with scalable gains when more box-only data are added. The practical impact is a substantial reduction in labeling effort for COD, enabling scalable deployment and broader research into camouflaged object segmentation.

Abstract

Existing Camouflaged Object Detection (COD) methods rely heavily on large-scale pixel-annotated training sets, which are both time-consuming and labor-intensive. Although weakly supervised methods offer higher annotation efficiency, their performance is far behind due to the unclear visual demarcations between foreground and background in camouflaged images. In this paper, we explore the potential of using boxes as prompts in camouflaged scenes and introduce the first weakly semi-supervised COD method, aiming for budget-efficient and high-precision camouflaged object segmentation with an extremely limited number of fully labeled images. Critically, learning from such limited set inevitably generates pseudo labels with serious noisy pixels. To address this, we propose a noise correction loss that facilitates the model's learning of correct pixels in the early learning stage, and corrects the error risk gradients dominated by noisy pixels in the memorization stage, ultimately achieving accurate segmentation of camouflaged objects from noisy labels. When using only 20% of fully labeled data, our method shows superior performance over the state-of-the-art methods.

Learning Camouflaged Object Detection from Noisy Pseudo Label

TL;DR

This work addresses the high labeling cost of camouflaged object detection by introducing Weakly Semi-Supervised COD (WSSCOD), which uses box prompts to generate high-quality pseudo labels and a minimal set of pixel-level annotations. A dual-network approach is employed: ANet (box+image branches) produces pseudo labels, while PNet (image-only) learns from real labels and these pseudo labels, guided by the Noise Correction Loss to balance learning in early and memorization phases. The key contributions include , a robust loss that mitigates noisy pixels in pseudo labels, and evidence that using only 20% of fully labeled data plus box prompts can achieve performance comparable to fully supervised methods, with scalable gains when more box-only data are added. The practical impact is a substantial reduction in labeling effort for COD, enabling scalable deployment and broader research into camouflaged object segmentation.

Abstract

Existing Camouflaged Object Detection (COD) methods rely heavily on large-scale pixel-annotated training sets, which are both time-consuming and labor-intensive. Although weakly supervised methods offer higher annotation efficiency, their performance is far behind due to the unclear visual demarcations between foreground and background in camouflaged images. In this paper, we explore the potential of using boxes as prompts in camouflaged scenes and introduce the first weakly semi-supervised COD method, aiming for budget-efficient and high-precision camouflaged object segmentation with an extremely limited number of fully labeled images. Critically, learning from such limited set inevitably generates pseudo labels with serious noisy pixels. To address this, we propose a noise correction loss that facilitates the model's learning of correct pixels in the early learning stage, and corrects the error risk gradients dominated by noisy pixels in the memorization stage, ultimately achieving accurate segmentation of camouflaged objects from noisy labels. When using only 20% of fully labeled data, our method shows superior performance over the state-of-the-art methods.
Paper Structure (15 sections, 6 equations, 7 figures, 3 tables)

This paper contains 15 sections, 6 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Supervision behaviors and outputs: $\mathcal{F}$, $\mathcal{P}$, $\mathcal{S}$, and $\mathcal{B}$ denote training with full, point, scribble, and box annotations. GT is the ground truth. The first row shows labels used for each method on the training image, with red for foreground and blue for background (if needed). The second row presents outputs trained with these labels.
  • Figure 2: Overview of the proposed WSSCOD. Left part: Train the auxiliary network with full and box annotations. Right part: Use images and proposals as input to generate pseudo labels through the auxiliary network, where FP (False Positive) and FN (False Negative) predictions represent noisy pixels. Then, under the supervision of $\mathcal{L}_{NC}$, the primary network is trained using pixel-level annotations and pseudo labels.
  • Figure 3: Overall architecture of our proposed model for PNet and ANet. For ANet, the model consists of a box branch, an image branch, and a decoder. For PNet, the model consists of an image branch and a decoder.
  • Figure 4: Fitting analysis of different setups at different training epochs. We train PNet$_{F1}$, PNet$_{F5}$, PNet$_{F10}$, and PNet$_{F20}$ for 100 epochs and assume that the model completes the early learning phase at epochs [20,40,60,80], changing the value of $q$ from 2 to 1 to indicate the model's transition to the memorization phase. Moreover, we set $q$ to 1 at the 0-th and 100-th epochs to use only the MAE-form and IoU-form of $\mathcal{L}_{NC}$, respectively, to validate the effectiveness of our strategy. The test metric is IoU score, and the test dataset is CAMO le2019anabranch.
  • Figure 5: Visualization comparison of ours and SOTA methods. Please zoom in to view.
  • ...and 2 more figures