Rethinking Saliency-Guided Weakly-Supervised Semantic Segmentation
Beomyoung Kim, Donghyun Kim, Sung Ju Hwang
TL;DR
This work reframes the role of saliency maps in image-level weakly-supervised semantic segmentation by showing that saliency map quality and the threshold used to convert activation maps into pseudo labels are critical yet underexplored. It demonstrates consistent, large performance variations across methods when different saliency maps are used, arguing that lack of standardization hampers fair comparisons. To address this, the authors introduce WSSS-BED, a unified framework that provides diverse saliency/activation maps and even unsupervised SOD outputs to enable controlled, reproducible experiments across seven WSSS methods. Empirically, high-quality saliency maps (e.g., from large SOD datasets like DUTS or COCO-derived masks) can boost WSSS performance toward or beyond state-of-the-art, while CAM can still be highly competitive with proper $\tau$ tuning, underscoring the importance of threshold design and saliency integration in practice.
Abstract
This paper presents a fresh perspective on the role of saliency maps in weakly-supervised semantic segmentation (WSSS) and offers new insights and research directions based on our empirical findings. We conduct comprehensive experiments and observe that the quality of the saliency map is a critical factor in saliency-guided WSSS approaches. Nonetheless, we find that the saliency maps used in previous works are often arbitrarily chosen, despite their significant impact on WSSS. Additionally, we observe that the choice of the threshold, which has received less attention before, is non-trivial in WSSS. To facilitate more meaningful and rigorous research for saliency-guided WSSS, we introduce \texttt{WSSS-BED}, a standardized framework for conducting research under unified conditions. \texttt{WSSS-BED} provides various saliency maps and activation maps for seven WSSS methods, as well as saliency maps from unsupervised salient object detection models.
