Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo Labeling and Multi-scale Feature Grouping
Chunming He, Kai Li, Yachao Zhang, Guoxia Xu, Longxiang Tang, Yulun Zhang, Zhenhua Guo, Xiu Li
TL;DR
This work tackles weakly-supervised Concealed Object Segmentation (COS) by leveraging the Segment Anything Model (SAM) to generate pseudo labels from sparse prompts and introducing a Multi-scale Feature Grouping (MFG) module to promote feature coherence across concealed objects. The WS-SAM framework employs multi-augmentation fusion, entropy-based pixel weighting, and image-level selection to produce reliable supervision, while MFG decomposes features into prototypes at multiple granularities and aggregates them with an RK2-inspired scheme. Together, these components address both weak supervision and intrinsic foreground-background similarity, enabling robust single- and multi-object segmentation. Extensive experiments across camouflaged object detection, polyp segmentation, and transparent object detection demonstrate state-of-the-art performance and strong robustness of the approach.
Abstract
Weakly-Supervised Concealed Object Segmentation (WSCOS) aims to segment objects well blended with surrounding environments using sparsely-annotated data for model training. It remains a challenging task since (1) it is hard to distinguish concealed objects from the background due to the intrinsic similarity and (2) the sparsely-annotated training data only provide weak supervision for model learning. In this paper, we propose a new WSCOS method to address these two challenges. To tackle the intrinsic similarity challenge, we design a multi-scale feature grouping module that first groups features at different granularities and then aggregates these grouping results. By grouping similar features together, it encourages segmentation coherence, helping obtain complete segmentation results for both single and multiple-object images. For the weak supervision challenge, we utilize the recently-proposed vision foundation model, Segment Anything Model (SAM), and use the provided sparse annotations as prompts to generate segmentation masks, which are used to train the model. To alleviate the impact of low-quality segmentation masks, we further propose a series of strategies, including multi-augmentation result ensemble, entropy-based pixel-level weighting, and entropy-based image-level selection. These strategies help provide more reliable supervision to train the segmentation model. We verify the effectiveness of our method on various WSCOS tasks, and experiments demonstrate that our method achieves state-of-the-art performance on these tasks.
