Table of Contents
Fetching ...

Just a Hint: Point-Supervised Camouflaged Object Detection

Huafeng Chen, Dian Shao, Guangqian Guo, Shan Gao

TL;DR

This work tackles camouflaged object detection with only point-level supervision by introducing a point-to-region supervision strategy, an attention-regulating mask, and unsupervised contrastive learning to stabilize representations. The approach is implemented over a Pyramid Transformer backbone and is trained on the newly created P-COD dataset, enabling training with merely a single point per object. Empirical results show substantial gains over existing weakly supervised COD methods and competitive performance against fully supervised models across COD benchmarks, with demonstrated transferability to scribble supervision and salient object detection. The contributions—Hint Area Generator, Attention Regulator, and Representation Optimizer—offer a practical and scalable pathway to high-quality COD with minimal annotation effort.

Abstract

Camouflaged Object Detection (COD) demands models to expeditiously and accurately distinguish objects which conceal themselves seamlessly in the environment. Owing to the subtle differences and ambiguous boundaries, COD is not only a remarkably challenging task for models but also for human annotators, requiring huge efforts to provide pixel-wise annotations. To alleviate the heavy annotation burden, we propose to fulfill this task with the help of only one point supervision. Specifically, by swiftly clicking on each object, we first adaptively expand the original point-based annotation to a reasonable hint area. Then, to avoid partial localization around discriminative parts, we propose an attention regulator to scatter model attention to the whole object through partially masking labeled regions. Moreover, to solve the unstable feature representation of camouflaged objects under only point-based annotation, we perform unsupervised contrastive learning based on differently augmented image pairs (e.g. changing color or doing translation). On three mainstream COD benchmarks, experimental results show that our model outperforms several weakly-supervised methods by a large margin across various metrics.

Just a Hint: Point-Supervised Camouflaged Object Detection

TL;DR

This work tackles camouflaged object detection with only point-level supervision by introducing a point-to-region supervision strategy, an attention-regulating mask, and unsupervised contrastive learning to stabilize representations. The approach is implemented over a Pyramid Transformer backbone and is trained on the newly created P-COD dataset, enabling training with merely a single point per object. Empirical results show substantial gains over existing weakly supervised COD methods and competitive performance against fully supervised models across COD benchmarks, with demonstrated transferability to scribble supervision and salient object detection. The contributions—Hint Area Generator, Attention Regulator, and Representation Optimizer—offer a practical and scalable pathway to high-quality COD with minimal annotation effort.

Abstract

Camouflaged Object Detection (COD) demands models to expeditiously and accurately distinguish objects which conceal themselves seamlessly in the environment. Owing to the subtle differences and ambiguous boundaries, COD is not only a remarkably challenging task for models but also for human annotators, requiring huge efforts to provide pixel-wise annotations. To alleviate the heavy annotation burden, we propose to fulfill this task with the help of only one point supervision. Specifically, by swiftly clicking on each object, we first adaptively expand the original point-based annotation to a reasonable hint area. Then, to avoid partial localization around discriminative parts, we propose an attention regulator to scatter model attention to the whole object through partially masking labeled regions. Moreover, to solve the unstable feature representation of camouflaged objects under only point-based annotation, we perform unsupervised contrastive learning based on differently augmented image pairs (e.g. changing color or doing translation). On three mainstream COD benchmarks, experimental results show that our model outperforms several weakly-supervised methods by a large margin across various metrics.
Paper Structure (13 sections, 7 equations, 8 figures, 12 tables)

This paper contains 13 sections, 7 equations, 8 figures, 12 tables.

Figures (8)

  • Figure 1: Different types of annotation in camouflaged object detection task. The mask annotation takes about 60 minutes for each image. The scribbles take about 10 seconds but have diversity and boundary issues. The point takes just 2 seconds and only needs to point the most discriminative part of the camouflaged object.
  • Figure 2: Overview of our method. The first row shows the process of hint area generation from just a point to a hypothetical area. The second row shows the training process of attention regulator and presentation optimizer from the hint area to an accurate mask.
  • Figure 3: Comparison of prediction accuracy for similar images during training. Although two images $I_1$ and $I_2$ are very similar, their prediction accuracy trends are opposite. It is largely due to that the learned features are not robust enough in weak supervision.
  • Figure 4: Details of our method. The main components consist of three parts: (a) hint area generator, (b) attention regulator, and (c) representation optimizer. Hint area generator extends the point label from a single point to the small hint region. Attention regulator enforce the model to focus on the whole object instead of being stuck in most discriminative part. Representation optimizer uses the unsupervised contrastive learning to learn a stable feature representation for disentangling camouflaged objects and background.
  • Figure 5: Visual comparison with representative scribble-supervised and fully-supervised models.
  • ...and 3 more figures