Table of Contents
Fetching ...

Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation

Feilong Tang, Zhongxing Xu, Zhaojun Qu, Wei Feng, Xingjian Jiang, Zongyuan Ge

TL;DR

Weakly supervised semantic segmentation often yields incomplete localization due to a knowledge bias between instance features and contextual prototypes. CPAL introduces context prototype aware learning that builds a dense bank of context prototypes, anchors instance prototypes, and uses soft positive neighbor selection with feature distribution alignment to produce more complete CAMs, trained with a unified BCE plus self-supervised loss. The approach provides a prototype aware framework with context prototypes, shifting aligned features, and a PACAM self-supervised objective, validated on VOC 2012 and COCO 2014, achieving state-of-the-art or competitive results when plugged into multiple baselines. Overall, CPAL improves object localization and pseudo-label quality for WSSS, enabling stronger segmentation pipelines with broader applicability.

Abstract

Recent weakly supervised semantic segmentation (WSSS) methods strive to incorporate contextual knowledge to improve the completeness of class activation maps (CAM). In this work, we argue that the knowledge bias between instances and contexts affects the capability of the prototype to sufficiently understand instance semantics. Inspired by prototype learning theory, we propose leveraging prototype awareness to capture diverse and fine-grained feature attributes of instances. The hypothesis is that contextual prototypes might erroneously activate similar and frequently co-occurring object categories due to this knowledge bias. Therefore, we propose to enhance the prototype representation ability by mitigating the bias to better capture spatial coverage in semantic object regions. With this goal, we present a Context Prototype-Aware Learning (CPAL) strategy, which leverages semantic context to enrich instance comprehension. The core of this method is to accurately capture intra-class variations in object features through context-aware prototypes, facilitating the adaptation to the semantic attributes of various instances. We design feature distribution alignment to optimize prototype awareness, aligning instance feature distributions with dense features. In addition, a unified training framework is proposed to combine label-guided classification supervision and prototypes-guided self-supervision. Experimental results on PASCAL VOC 2012 and MS COCO 2014 show that CPAL significantly improves off-the-shelf methods and achieves state-of-the-art performance. The project is available at https://github.com/Barrett-python/CPAL.

Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation

TL;DR

Weakly supervised semantic segmentation often yields incomplete localization due to a knowledge bias between instance features and contextual prototypes. CPAL introduces context prototype aware learning that builds a dense bank of context prototypes, anchors instance prototypes, and uses soft positive neighbor selection with feature distribution alignment to produce more complete CAMs, trained with a unified BCE plus self-supervised loss. The approach provides a prototype aware framework with context prototypes, shifting aligned features, and a PACAM self-supervised objective, validated on VOC 2012 and COCO 2014, achieving state-of-the-art or competitive results when plugged into multiple baselines. Overall, CPAL improves object localization and pseudo-label quality for WSSS, enabling stronger segmentation pipelines with broader applicability.

Abstract

Recent weakly supervised semantic segmentation (WSSS) methods strive to incorporate contextual knowledge to improve the completeness of class activation maps (CAM). In this work, we argue that the knowledge bias between instances and contexts affects the capability of the prototype to sufficiently understand instance semantics. Inspired by prototype learning theory, we propose leveraging prototype awareness to capture diverse and fine-grained feature attributes of instances. The hypothesis is that contextual prototypes might erroneously activate similar and frequently co-occurring object categories due to this knowledge bias. Therefore, we propose to enhance the prototype representation ability by mitigating the bias to better capture spatial coverage in semantic object regions. With this goal, we present a Context Prototype-Aware Learning (CPAL) strategy, which leverages semantic context to enrich instance comprehension. The core of this method is to accurately capture intra-class variations in object features through context-aware prototypes, facilitating the adaptation to the semantic attributes of various instances. We design feature distribution alignment to optimize prototype awareness, aligning instance feature distributions with dense features. In addition, a unified training framework is proposed to combine label-guided classification supervision and prototypes-guided self-supervision. Experimental results on PASCAL VOC 2012 and MS COCO 2014 show that CPAL significantly improves off-the-shelf methods and achieves state-of-the-art performance. The project is available at https://github.com/Barrett-python/CPAL.
Paper Structure (12 sections, 14 equations, 5 figures, 5 tables)

This paper contains 12 sections, 14 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: The main idea promoted throughout the paper is that semantic context prototype-aware underpins localization of individual objects in WSSS. Our CPAL performs adaptive perception of diverse attributes (e.g.,cat) with attribute hunting (c) rather than from single prototype (a) and plain context prototypes (b). This attribute-specific adaptation not only mitigates the risk of errors where (b) mistakenly identifies similar categories (e.g.,dog) but also ensures accurate activation of the complete object region.
  • Figure 2: Overview of the proposed unified learning framework. (a) shows image label-guided WSSS (from classification to segmentation). The upper branch describes the classification network $\theta$ identifying object regions corresponding to each category to minimize $\mathcal{L}^{BCE}$. Introduce a self-supervised learning paradigm using context prototype-aware learning to provide a more complete CAM, supervising the initial CAM and minimizing $\mathcal{L}^{Self}$. The lower branch refines these CAMs (e.g., DenceCRF krahenbuhl2011efficient) to form pseudo-labels for supervising the semantic segmentation network. (b) outlines our strategy based on context prototype-aware learning. In mini-batches, instance prototypes $\mathcal{P}_n^I$ are generated using CAM and extracted features $f$, updating the support bank. Then, the bank is used to construct a context prototype set $\mathcal{P}_n^\text{c}$. Feature distribution alignment is then applied to the current instance features, adding a shift term $\delta_n$ to guide them toward clusters of dense features in the bank. Next, soft neighbors are softly measured for $\mathcal{P}_n^I$ based on $\mathcal{P}_n^\text{c}$, with $\mathcal{P}_n^I$ serving as anchors. Finally, positiveness value $w_i$ can be computed between two specific attributes. This mechanism selects $K$ soft positive neighbors $\tilde{\mathcal{P}}_n^{\text{c}}$ to generate PACAM.
  • Figure 3: Feature embedding visualizations of (a) our method without feature distribution alignment, and (b) our method on the Pascal VOC 2012 val images using t-SNE van2008visualizing. Feature distribution alignment improves the compactness of intra-class features.
  • Figure 4: Sensitivity analysis on PASCAL VOC 2012 train set, in terms of (a) the threshold $\tau$ used to generate 0-1 seed masks from heatmaps. (b) the length of the support set. The results show that CPAL is not sensitive to them.
  • Figure 5: Qualitative visualization on the PASCAL VOC train set. (a) PACAM is obtained using various soft positive prototypes to enhance the comprehension of our model. (b) Visual comparison of ablation study two main components: our model without prototype-aware learning (top-$K$ candidate neighbor set and positiveness prediction) or self-supervised loss. (c) The impact of our method as a plug-in to AMN lee2022threshold and MCTformer xu2022multi significantly improves the object localization ability of networks.