Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation
Hyun Seok Seong, WonJun Moon, SuBeen Lee, Jae-Pil Heo
TL;DR
The paper tackles unsupervised semantic segmentation (USS) by addressing unreliable patch-level guidance derived from image-level self-supervised models. It introduces Progressive Proxy Anchor Propagation (PPAP), a two-branch framework that progressively relocates proxy anchors toward densely populated, semantically similar regions to build trustworthy positive sets, while defining an ambiguity zone that excludes uncertain negatives. The training objective uses a tri-partite contrastive loss with ambiguity-excluded negatives, enabling robust patch-wise supervision. Extensive experiments across COCO-stuff, Cityscapes, Potsdam-3, and ImageNet-S demonstrate state-of-the-art performance, with ablations validating the contributions of trustworthy positives and ambiguity handling. PPAP offers a practical, scalable approach to improving USS by refining the supervision signal through data distribution-aware proxy anchor propagation and selective negative sampling, with improvements maintained across multiple backbones and datasets.
Abstract
The labor-intensive labeling for semantic segmentation has spurred the emergence of Unsupervised Semantic Segmentation. Recent studies utilize patch-wise contrastive learning based on features from image-level self-supervised pretrained models. However, relying solely on similarity-based supervision from image-level pretrained models often leads to unreliable guidance due to insufficient patch-level semantic representations. To address this, we propose a Progressive Proxy Anchor Propagation (PPAP) strategy. This method gradually identifies more trustworthy positives for each anchor by relocating its proxy to regions densely populated with semantically similar samples. Specifically, we initially establish a tight boundary to gather a few reliable positive samples around each anchor. Then, considering the distribution of positive samples, we relocate the proxy anchor towards areas with a higher concentration of positives and adjust the positiveness boundary based on the propagation degree of the proxy anchor. Moreover, to account for ambiguous regions where positive and negative samples may coexist near the positiveness boundary, we introduce an instance-wise ambiguous zone. Samples within these zones are excluded from the negative set, further enhancing the reliability of the negative set. Our state-of-the-art performances on various datasets validate the effectiveness of the proposed method for Unsupervised Semantic Segmentation.
