Table of Contents
Fetching ...

Tackling Ambiguity from Perspective of Uncertainty Inference and Affinity Diversification for Weakly Supervised Semantic Segmentation

Zhiwei Yang, Yucong Meng, Kexue Fu, Shuo Wang, Zhijian Song

TL;DR

This work tackles ambiguity in weakly supervised semantic segmentation by proposing UniA, a unified single-stage framework that combines uncertainty inference with Gaussian feature modeling and an affinity diversification module for pseudo-label refinement. By treating the feature extraction as a probabilistic process, UniA estimates uncertainty to suppress false activations during CAM generation and uses a Sinkhorn-based, contrastive affinity mechanism to diversify semantics during refinement. Key contributions include a probabilistic distribution over features with channel and spatial attentions, a distribution loss to maintain uncertainty, a mutual complementing refinement strategy, and a contrastive affinity loss to propagate diversity, all within an end-to-end trainable pipeline. Extensive experiments on PASCAL VOC 2012, MS COCO 2014, and medical ACDC demonstrate strong performance improvements, effective reduction of ambiguity-induced errors, and improved training efficiency compared with prior single- and multi-stage approaches.

Abstract

Weakly supervised semantic segmentation (WSSS) with image-level labels intends to achieve dense tasks without laborious annotations. However, due to the ambiguous contexts and fuzzy regions, the performance of WSSS, especially the stages of generating Class Activation Maps (CAMs) and refining pseudo masks, widely suffers from ambiguity while being barely noticed by previous literature. In this work, we propose UniA, a unified single-staged WSSS framework, to efficiently tackle this issue from the perspective of uncertainty inference and affinity diversification, respectively. When activating class objects, we argue that the false activation stems from the bias to the ambiguous regions during the feature extraction. Therefore, we design a more robust feature representation with a probabilistic Gaussian distribution and introduce the uncertainty estimation to avoid the bias. A distribution loss is particularly proposed to supervise the process, which effectively captures the ambiguity and models the complex dependencies among features. When refining pseudo labels, we observe that the affinity from the prevailing refinement methods intends to be similar among ambiguities. To this end, an affinity diversification module is proposed to promote diversity among semantics. A mutual complementing refinement is proposed to initially rectify the ambiguous affinity with multiple inferred pseudo labels. More importantly, a contrastive affinity loss is further designed to diversify the relations among unrelated semantics, which reliably propagates the diversity into the whole feature representations and helps generate better pseudo masks. Extensive experiments are conducted on PASCAL VOC, MS COCO, and medical ACDC datasets, which validate the efficiency of UniA tackling ambiguity and the superiority over recent single-staged or even most multi-staged competitors.

Tackling Ambiguity from Perspective of Uncertainty Inference and Affinity Diversification for Weakly Supervised Semantic Segmentation

TL;DR

This work tackles ambiguity in weakly supervised semantic segmentation by proposing UniA, a unified single-stage framework that combines uncertainty inference with Gaussian feature modeling and an affinity diversification module for pseudo-label refinement. By treating the feature extraction as a probabilistic process, UniA estimates uncertainty to suppress false activations during CAM generation and uses a Sinkhorn-based, contrastive affinity mechanism to diversify semantics during refinement. Key contributions include a probabilistic distribution over features with channel and spatial attentions, a distribution loss to maintain uncertainty, a mutual complementing refinement strategy, and a contrastive affinity loss to propagate diversity, all within an end-to-end trainable pipeline. Extensive experiments on PASCAL VOC 2012, MS COCO 2014, and medical ACDC demonstrate strong performance improvements, effective reduction of ambiguity-induced errors, and improved training efficiency compared with prior single- and multi-stage approaches.

Abstract

Weakly supervised semantic segmentation (WSSS) with image-level labels intends to achieve dense tasks without laborious annotations. However, due to the ambiguous contexts and fuzzy regions, the performance of WSSS, especially the stages of generating Class Activation Maps (CAMs) and refining pseudo masks, widely suffers from ambiguity while being barely noticed by previous literature. In this work, we propose UniA, a unified single-staged WSSS framework, to efficiently tackle this issue from the perspective of uncertainty inference and affinity diversification, respectively. When activating class objects, we argue that the false activation stems from the bias to the ambiguous regions during the feature extraction. Therefore, we design a more robust feature representation with a probabilistic Gaussian distribution and introduce the uncertainty estimation to avoid the bias. A distribution loss is particularly proposed to supervise the process, which effectively captures the ambiguity and models the complex dependencies among features. When refining pseudo labels, we observe that the affinity from the prevailing refinement methods intends to be similar among ambiguities. To this end, an affinity diversification module is proposed to promote diversity among semantics. A mutual complementing refinement is proposed to initially rectify the ambiguous affinity with multiple inferred pseudo labels. More importantly, a contrastive affinity loss is further designed to diversify the relations among unrelated semantics, which reliably propagates the diversity into the whole feature representations and helps generate better pseudo masks. Extensive experiments are conducted on PASCAL VOC, MS COCO, and medical ACDC datasets, which validate the efficiency of UniA tackling ambiguity and the superiority over recent single-staged or even most multi-staged competitors.
Paper Structure (16 sections, 17 equations, 8 figures, 9 tables)

This paper contains 16 sections, 17 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: Motivation of UniA. (a) Ambiguity induces false estimation, which hinders the performance of WSSS. (b) The proposed UniA effectively suppresses it from the perspective of uncertainty inference and affinity diversification. (c) UniA can precisely activate objects.
  • Figure 2: Overview of the proposed UniA for weakly supervised semantic segmentation. Given an input image, the uncertainty inference network firstly estimates uncertainty, which helps generate reliable CAM seeds. Then the affinity diversification module is designed to promote the difference among ambiguities, which further helps conduct reliable refinement. Finally, the pseudo labels are used to train a decoder for segmentation. The whole pipeline is trained end-to-end.
  • Figure 3: Architecture of uncertainty inference network. (a) Given the extracted features $Z$, our method learns a Gaussian distribution of $Z$ with channel and spatial attention, i.e., $f_{\theta}(\cdot)$ and $g_{\theta}(\cdot)$, for locally perceiving textures and globally building dependencies among semantics. Uncertainty is estimated from the distribution and a distribution loss $\mathcal{L}_{dis}$ is designed to guarantee it. The soft ambiguity masking is applied to incorporate the uncertainty into feature learning. Finally, the uncertainty-informed features are sent to a classification loss and CAM is generated. (b) Architecture of the spatial attention $g_{\theta}(\cdot)$.
  • Figure 4: Pseudo labels refined with affinity. Although the raw affinity from attention improves the quality, it inevitably introduces noise and incurs false negatives and false positives due to ambiguity.
  • Figure 5: Illustration of Affinity diversification module. $P_{1}$ is the semantic mask generated from reweighted CAM. $P_{2}$ denotes the masks from RGB and position information. $P_{3}$ is the mask from affinity. Mutual complementing refinement directly remedies the pseudo labels and an affinity loss deeply propagates the diversity into feature representations .
  • ...and 3 more figures