Prompt Categories Cluster for Weakly Supervised Semantic Segmentation
Wangyu Wu, Xianglin Qiu, Siqi Song, Xiaowei Huang, Fei Ma, Jimin Xiao
TL;DR
This work tackles weakly supervised semantic segmentation (WSSS) with image-level labels by addressing semantic ambiguity among similar classes and ViT-induced over-smoothing. It introduces Prompt Categories Clustering (PCC), a ViT-based framework that uses GPT-generated category clusters as learnable tokens, concatenated with patch tokens and refined by HV-BiLSTM; training relies on Top-$k$ pooling and multi-label cross-entropy, with CRF and DeepLabv2 used for final refinement. A key novelty is the self-refining prompt mechanism that derives coherent category clusters and a cluster-token integration that enables sharing information across related categories (e.g., animals). The approach yields state-of-the-art results on PASCAL VOC 2012, improving pseudo-label quality and segmentation accuracy, and demonstrates the practical benefit of language-informed clustering in visual scene understanding.
Abstract
Weakly Supervised Semantic Segmentation (WSSS), which leverages image-level labels, has garnered significant attention due to its cost-effectiveness. The previous methods mainly strengthen the inter-class differences to avoid class semantic ambiguity which may lead to erroneous activation. However, they overlook the positive function of some shared information between similar classes. Categories within the same cluster share some similar features. Allowing the model to recognize these features can further relieve the semantic ambiguity between these classes. To effectively identify and utilize this shared information, in this paper, we introduce a novel WSSS framework called Prompt Categories Clustering (PCC). Specifically, we explore the ability of Large Language Models (LLMs) to derive category clusters through prompts. These clusters effectively represent the intrinsic relationships between categories. By integrating this relational information into the training network, our model is able to better learn the hidden connections between categories. Experimental results demonstrate the effectiveness of our approach, showing its ability to enhance performance on the PASCAL VOC 2012 dataset and surpass existing state-of-the-art methods in WSSS.
