Table of Contents
Fetching ...

High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation

Arvi Jonnarth, Yushan Zhang, Michael Felsberg

TL;DR

This work reformulates both techniques for improving CAMs based on binomial posteriors of multiple independent binary problems, resulting in an add-on method that can boost virtually any WSSS method.

Abstract

Image-level weakly-supervised semantic segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training. The typical approach involves training an image classification network using global average pooling (GAP) on convolutional feature maps. This enables the estimation of object locations based on class activation maps (CAMs), which identify the importance of image regions. The CAMs are then used to generate pseudo-labels, in the form of segmentation masks, to supervise a segmentation model in the absence of pixel-level ground truth. Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss, which utilizes a heuristic that object contours almost always align with color edges in images. However, both are based on the multinomial posterior with softmax, and implicitly assume that classes are mutually exclusive, which turns out suboptimal in our experiments. Thus, we reformulate both techniques based on binomial posteriors of multiple independent binary problems. This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method. This is demonstrated on a wide variety of baselines on the PASCAL VOC dataset, improving the region similarity and contour quality of all implemented state-of-the-art methods. Experiments on the MS COCO dataset further show that our proposed add-on is well-suited for large-scale settings. Our code implementation is available at https://github.com/arvijj/hfpl.

High-fidelity Pseudo-labels for Boosting Weakly-Supervised Segmentation

TL;DR

This work reformulates both techniques for improving CAMs based on binomial posteriors of multiple independent binary problems, resulting in an add-on method that can boost virtually any WSSS method.

Abstract

Image-level weakly-supervised semantic segmentation (WSSS) reduces the usually vast data annotation cost by surrogate segmentation masks during training. The typical approach involves training an image classification network using global average pooling (GAP) on convolutional feature maps. This enables the estimation of object locations based on class activation maps (CAMs), which identify the importance of image regions. The CAMs are then used to generate pseudo-labels, in the form of segmentation masks, to supervise a segmentation model in the absence of pixel-level ground truth. Our work is based on two techniques for improving CAMs; importance sampling, which is a substitute for GAP, and the feature similarity loss, which utilizes a heuristic that object contours almost always align with color edges in images. However, both are based on the multinomial posterior with softmax, and implicitly assume that classes are mutually exclusive, which turns out suboptimal in our experiments. Thus, we reformulate both techniques based on binomial posteriors of multiple independent binary problems. This has two benefits; their performance is improved and they become more general, resulting in an add-on method that can boost virtually any WSSS method. This is demonstrated on a wide variety of baselines on the PASCAL VOC dataset, improving the region similarity and contour quality of all implemented state-of-the-art methods. Experiments on the MS COCO dataset further show that our proposed add-on is well-suited for large-scale settings. Our code implementation is available at https://github.com/arvijj/hfpl.
Paper Structure (21 sections, 20 equations, 7 figures, 8 tables)

This paper contains 21 sections, 20 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Region similarity vs. contour quality. The arrows represent improvements from our proposed approach, which boosts the performance of five state-of-the-art methods. The results are five-run averages, which required reimplementation of all methods.
  • Figure 2: Illustration of how our improved feature similarity loss improves an initial CAM.
  • Figure 3: Illustration of the feature similarity loss (FSL).
  • Figure 4: Illustration of global pooling methods. Top to bottom; max pooling, average pooling, multi-sample importance sampling.
  • Figure 5: Qualitative results for the implemented methods on VOC. "++" indicates that ISL and FSL were used for training.
  • ...and 2 more figures