Learning Camouflaged Object Detection from Noisy Pseudo Label
Jin Zhang, Ruiheng Zhang, Yanjiao Shi, Zhe Cao, Nian Liu, Fahad Shahbaz Khan
TL;DR
This work addresses the high labeling cost of camouflaged object detection by introducing Weakly Semi-Supervised COD (WSSCOD), which uses box prompts to generate high-quality pseudo labels and a minimal set of pixel-level annotations. A dual-network approach is employed: ANet (box+image branches) produces pseudo labels, while PNet (image-only) learns from real labels and these pseudo labels, guided by the Noise Correction Loss $L_{NC}$ to balance learning in early and memorization phases. The key contributions include $L_{NC}$, a robust loss that mitigates noisy pixels in pseudo labels, and evidence that using only 20% of fully labeled data plus box prompts can achieve performance comparable to fully supervised methods, with scalable gains when more box-only data are added. The practical impact is a substantial reduction in labeling effort for COD, enabling scalable deployment and broader research into camouflaged object segmentation.
Abstract
Existing Camouflaged Object Detection (COD) methods rely heavily on large-scale pixel-annotated training sets, which are both time-consuming and labor-intensive. Although weakly supervised methods offer higher annotation efficiency, their performance is far behind due to the unclear visual demarcations between foreground and background in camouflaged images. In this paper, we explore the potential of using boxes as prompts in camouflaged scenes and introduce the first weakly semi-supervised COD method, aiming for budget-efficient and high-precision camouflaged object segmentation with an extremely limited number of fully labeled images. Critically, learning from such limited set inevitably generates pseudo labels with serious noisy pixels. To address this, we propose a noise correction loss that facilitates the model's learning of correct pixels in the early learning stage, and corrects the error risk gradients dominated by noisy pixels in the memorization stage, ultimately achieving accurate segmentation of camouflaged objects from noisy labels. When using only 20% of fully labeled data, our method shows superior performance over the state-of-the-art methods.
