Table of Contents
Fetching ...

Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation

Hyun Seok Seong, WonJun Moon, SuBeen Lee, Jae-Pil Heo

TL;DR

The paper tackles unsupervised semantic segmentation (USS) by addressing unreliable patch-level guidance derived from image-level self-supervised models. It introduces Progressive Proxy Anchor Propagation (PPAP), a two-branch framework that progressively relocates proxy anchors toward densely populated, semantically similar regions to build trustworthy positive sets, while defining an ambiguity zone that excludes uncertain negatives. The training objective uses a tri-partite contrastive loss with ambiguity-excluded negatives, enabling robust patch-wise supervision. Extensive experiments across COCO-stuff, Cityscapes, Potsdam-3, and ImageNet-S demonstrate state-of-the-art performance, with ablations validating the contributions of trustworthy positives and ambiguity handling. PPAP offers a practical, scalable approach to improving USS by refining the supervision signal through data distribution-aware proxy anchor propagation and selective negative sampling, with improvements maintained across multiple backbones and datasets.

Abstract

The labor-intensive labeling for semantic segmentation has spurred the emergence of Unsupervised Semantic Segmentation. Recent studies utilize patch-wise contrastive learning based on features from image-level self-supervised pretrained models. However, relying solely on similarity-based supervision from image-level pretrained models often leads to unreliable guidance due to insufficient patch-level semantic representations. To address this, we propose a Progressive Proxy Anchor Propagation (PPAP) strategy. This method gradually identifies more trustworthy positives for each anchor by relocating its proxy to regions densely populated with semantically similar samples. Specifically, we initially establish a tight boundary to gather a few reliable positive samples around each anchor. Then, considering the distribution of positive samples, we relocate the proxy anchor towards areas with a higher concentration of positives and adjust the positiveness boundary based on the propagation degree of the proxy anchor. Moreover, to account for ambiguous regions where positive and negative samples may coexist near the positiveness boundary, we introduce an instance-wise ambiguous zone. Samples within these zones are excluded from the negative set, further enhancing the reliability of the negative set. Our state-of-the-art performances on various datasets validate the effectiveness of the proposed method for Unsupervised Semantic Segmentation.

Progressive Proxy Anchor Propagation for Unsupervised Semantic Segmentation

TL;DR

The paper tackles unsupervised semantic segmentation (USS) by addressing unreliable patch-level guidance derived from image-level self-supervised models. It introduces Progressive Proxy Anchor Propagation (PPAP), a two-branch framework that progressively relocates proxy anchors toward densely populated, semantically similar regions to build trustworthy positive sets, while defining an ambiguity zone that excludes uncertain negatives. The training objective uses a tri-partite contrastive loss with ambiguity-excluded negatives, enabling robust patch-wise supervision. Extensive experiments across COCO-stuff, Cityscapes, Potsdam-3, and ImageNet-S demonstrate state-of-the-art performance, with ablations validating the contributions of trustworthy positives and ambiguity handling. PPAP offers a practical, scalable approach to improving USS by refining the supervision signal through data distribution-aware proxy anchor propagation and selective negative sampling, with improvements maintained across multiple backbones and datasets.

Abstract

The labor-intensive labeling for semantic segmentation has spurred the emergence of Unsupervised Semantic Segmentation. Recent studies utilize patch-wise contrastive learning based on features from image-level self-supervised pretrained models. However, relying solely on similarity-based supervision from image-level pretrained models often leads to unreliable guidance due to insufficient patch-level semantic representations. To address this, we propose a Progressive Proxy Anchor Propagation (PPAP) strategy. This method gradually identifies more trustworthy positives for each anchor by relocating its proxy to regions densely populated with semantically similar samples. Specifically, we initially establish a tight boundary to gather a few reliable positive samples around each anchor. Then, considering the distribution of positive samples, we relocate the proxy anchor towards areas with a higher concentration of positives and adjust the positiveness boundary based on the propagation degree of the proxy anchor. Moreover, to account for ambiguous regions where positive and negative samples may coexist near the positiveness boundary, we introduce an instance-wise ambiguous zone. Samples within these zones are excluded from the negative set, further enhancing the reliability of the negative set. Our state-of-the-art performances on various datasets validate the effectiveness of the proposed method for Unsupervised Semantic Segmentation.
Paper Structure (31 sections, 8 equations, 8 figures, 10 tables)

This paper contains 31 sections, 8 equations, 8 figures, 10 tables.

Figures (8)

  • Figure 1: (a) Illustration of how positive and negative sets are determined in HP hp and Ours. Different colors indicate different potential classes. In HP, $k$-th nearest neighbor on a per-sample basis becomes an instance-wise positiveness criterion, and all other samples farther than $k$-th neighbor are considered as the negatives. On the other hand, we progressively propagate the proxy anchor to be relocated in the region surrounded by semantically similar samples. Consequently, we enable the trustworthy positive collection with numerous samples in dense regions. Additionally, we define an ambiguous region around the positiveness boundary where semantic boundaries might be ambiguous. By excluding the samples in the ambiguous region in training, we avoid the undesired repulsion between the anchor and the possibly false positives in the negative set. (b) The number of positives and their precision with respect to the ground truth label for a randomly sampled subset of the dataset. The X-axis represents anchors within the range of [0%, $q$%] in the anchor list, sorted in ascending order by the number of identified positives. The bar plot displays the average number of gathered positives, and the line plot illustrates their precision, as determined by the ground truth labels of both the anchor and the positives.
  • Figure 2: Overall procedure of Progressive Proxy Anchor Propagation (PPAP). Our backbone consists of two branches: one for acquiring the training guidance, and the other for task adaptive finetuning. Specifically, the former feature extractor produces feature $\mathbf{f}$ used to compute training guidance via trustworthy positive and ambiguity-excluded negative sets by PPAP, and its parameters are frozen for stable guidance. On the other hand, the latter branch is being finetuned with the training guidance to learn task-adaptive feature $\mathbf{z}$.
  • Figure 3: Qualitative comparison results of PPAP (Ours) with STEGO and HP on the COCO-stuff dataset with DINO pretrained ViT-S/8 backbone.
  • Figure 4: Ablation studies of various coefficients on three different datasets. Whereas the X-axis denotes the value of each hyperparameter, the Y-axis shows the performance.
  • Figure 5: Comparison of gathered positives between HP and PPAP (Ours) with the visualizations. In all examples, blue boxes indicate the selected anchor patch and the red boxes denote the patches that are considered as positive to an anchor. In (d), yellow dotted circles exist to highlight the region where the $FP$ are detected.
  • ...and 3 more figures