Table of Contents
Fetching ...

SFC: Shared Feature Calibration in Weakly Supervised Semantic Segmentation

Xinqiao Zhao, Feilong Tang, Xiaoyang Wang, Jimin Xiao

TL;DR

This work identifies long-tailed data as a driver of CAM miscalibration in weakly supervised semantic segmentation due to shared features across head and tail classes. It introduces Shared Feature Calibration (SFC), combining classifier weight CAM and prototype CAM with an Image Bank Re-sampling (IBR) strategy and a Multi-Scaled Distribution-Weighted (MSDW) consistency loss to balance activations and tighten CAM boundaries. The approach achieves new state-of-the-art results on PASCAL VOC 2012 and MS COCO 2014 for WSSS with only image-level labels, and ablations confirm the effectiveness of IBR, the MSDW components, and the distribution-coefficient weighting. The method offers a practical route to high-quality pseudo-labels and improved segmentation performance in long-tailed, weakly supervised settings, with code available for reproducibility.

Abstract

Image-level weakly supervised semantic segmentation has received increasing attention due to its low annotation cost. Existing methods mainly rely on Class Activation Mapping (CAM) to obtain pseudo-labels for training semantic segmentation models. In this work, we are the first to demonstrate that long-tailed distribution in training data can cause the CAM calculated through classifier weights over-activated for head classes and under-activated for tail classes due to the shared features among head- and tail- classes. This degrades pseudo-label quality and further influences final semantic segmentation performance. To address this issue, we propose a Shared Feature Calibration (SFC) method for CAM generation. Specifically, we leverage the class prototypes that carry positive shared features and propose a Multi-Scaled Distribution-Weighted (MSDW) consistency loss for narrowing the gap between the CAMs generated through classifier weights and class prototypes during training. The MSDW loss counterbalances over-activation and under-activation by calibrating the shared features in head-/tail-class classifier weights. Experimental results show that our SFC significantly improves CAM boundaries and achieves new state-of-the-art performances. The project is available at https://github.com/Barrett-python/SFC.

SFC: Shared Feature Calibration in Weakly Supervised Semantic Segmentation

TL;DR

This work identifies long-tailed data as a driver of CAM miscalibration in weakly supervised semantic segmentation due to shared features across head and tail classes. It introduces Shared Feature Calibration (SFC), combining classifier weight CAM and prototype CAM with an Image Bank Re-sampling (IBR) strategy and a Multi-Scaled Distribution-Weighted (MSDW) consistency loss to balance activations and tighten CAM boundaries. The approach achieves new state-of-the-art results on PASCAL VOC 2012 and MS COCO 2014 for WSSS with only image-level labels, and ablations confirm the effectiveness of IBR, the MSDW components, and the distribution-coefficient weighting. The method offers a practical route to high-quality pseudo-labels and improved segmentation performance in long-tailed, weakly supervised settings, with code available for reproducibility.

Abstract

Image-level weakly supervised semantic segmentation has received increasing attention due to its low annotation cost. Existing methods mainly rely on Class Activation Mapping (CAM) to obtain pseudo-labels for training semantic segmentation models. In this work, we are the first to demonstrate that long-tailed distribution in training data can cause the CAM calculated through classifier weights over-activated for head classes and under-activated for tail classes due to the shared features among head- and tail- classes. This degrades pseudo-label quality and further influences final semantic segmentation performance. To address this issue, we propose a Shared Feature Calibration (SFC) method for CAM generation. Specifically, we leverage the class prototypes that carry positive shared features and propose a Multi-Scaled Distribution-Weighted (MSDW) consistency loss for narrowing the gap between the CAMs generated through classifier weights and class prototypes during training. The MSDW loss counterbalances over-activation and under-activation by calibrating the shared features in head-/tail-class classifier weights. Experimental results show that our SFC significantly improves CAM boundaries and achieves new state-of-the-art performances. The project is available at https://github.com/Barrett-python/SFC.
Paper Structure (22 sections, 16 equations, 4 figures, 7 tables)

This paper contains 22 sections, 16 equations, 4 figures, 7 tables.

Figures (4)

  • Figure 1: Illustration of how shared features influence CAMs under a long-tailed scenario and the effects of our proposed SFC. (a) shows Pascal VOC 2012 everingham2010pascal is a naturally long-tailed distributed dataset. (b) explains the shared feature components in head-/tail-class classifier weights and prototypes. (c) shows how over-/under-activations happen. (d) shows the CAMs of head-/tail-class examples. Our SFC achieves better results with appropriate activation areas.
  • Figure 2: The overall structure of our proposed SFC. For each training image, two distribution-weighted consistency losses ($\mathcal{L}_{\text{DW}}^{\text{P}}$ and $\mathcal{L}_{\text{DW}}^{\text{W}}$) are calculated, where $\mathcal{L}_{\text{DW}}^{\text{P}}$ is calculated between the prototype CAM ($\mathcal{M}_{\text{P}}$) and classifier weight CAM ($\mathcal{M}_{\text{W}}$) of original image and $\mathcal{L}_{\text{DW}}^{\text{W}}$ is calculated between the classifier weight CAMs of down-scaled and original images. In addition, an image bank that stores the latest shown images for different classes is maintained, and images are uniformly sampled from it to complement the original training batch, increasing the consistency loss optimization frequency for tail classes. Finally, the classifier weight CAM is complemented with prototype CAM in inference.
  • Figure 3: CAM visualization results on PASCAL VOC 2012, demonstrating Conclusion 2 and Conclusion 3. (a) input images; (b) classifier weight CAMs; (c) prototype CAMs; (d) final CAMs generated through our SFC; (e) ground truth.
  • Figure 4: CAM visualization results under different settings.