Table of Contents
Fetching ...

P-NOC: adversarial training of CAM generating networks for robust weakly supervised semantic segmentation priors

Lucas David, Helio Pedrini, Zanoni Dias

TL;DR

This work addresses the limitations of CAM-based weakly supervised semantic segmentation by analyzing complementary WSSS techniques and introducing two key innovations: P-NOC, an adversarial training framework that co-evolves CAM-generating and discriminative features, and CCAM-H, which injects weakly supervised saliency hints into a contrastive saliency model. The authors further combine these priors with a refined affinity process and random-walk-based propagation to produce high-quality pseudo-segmentation masks, achieving competitive results on VOC2012 and MS COCO 2014 without strong supervision. The findings demonstrate that leveraging complementary cues and weak saliency information yields robust priors and effective pseudo-masks, significantly narrowing the gap to fully supervised methods. Overall, the approach provides a practical, scalable path to robust WSSS by uniting adversarial CAM training, saliency-aware priors, and affinity-based refinement.

Abstract

Weakly Supervised Semantic Segmentation (WSSS) techniques explore individual regularization strategies to refine Class Activation Maps (CAMs). In this work, we first analyze complementary WSSS techniques in the literature, their segmentation properties, and the conditions in which they are most effective. Based on these findings, we devise two new techniques: P-NOC and CCAM-H. In the first, we promote the conjoint training of two adversarial CAM generating networks: the generator, which progressively learns to erase regions containing class-specific features, and a discriminator, which is refined to gradually shift its attention to new class discriminant features. In the latter, we employ the high quality pseudo-segmentation priors produced by P-NOC to guide the learning to saliency information in a weakly supervised fashion. Finally, we employ both pseudo-segmentation priors and pseudo-saliency proposals in the random walk procedure, resulting in higher quality pseudo-semantic segmentation masks, and competitive results with the state of the art.

P-NOC: adversarial training of CAM generating networks for robust weakly supervised semantic segmentation priors

TL;DR

This work addresses the limitations of CAM-based weakly supervised semantic segmentation by analyzing complementary WSSS techniques and introducing two key innovations: P-NOC, an adversarial training framework that co-evolves CAM-generating and discriminative features, and CCAM-H, which injects weakly supervised saliency hints into a contrastive saliency model. The authors further combine these priors with a refined affinity process and random-walk-based propagation to produce high-quality pseudo-segmentation masks, achieving competitive results on VOC2012 and MS COCO 2014 without strong supervision. The findings demonstrate that leveraging complementary cues and weak saliency information yields robust priors and effective pseudo-masks, significantly narrowing the gap to fully supervised methods. Overall, the approach provides a practical, scalable path to robust WSSS by uniting adversarial CAM training, saliency-aware priors, and affinity-based refinement.

Abstract

Weakly Supervised Semantic Segmentation (WSSS) techniques explore individual regularization strategies to refine Class Activation Maps (CAMs). In this work, we first analyze complementary WSSS techniques in the literature, their segmentation properties, and the conditions in which they are most effective. Based on these findings, we devise two new techniques: P-NOC and CCAM-H. In the first, we promote the conjoint training of two adversarial CAM generating networks: the generator, which progressively learns to erase regions containing class-specific features, and a discriminator, which is refined to gradually shift its attention to new class discriminant features. In the latter, we employ the high quality pseudo-segmentation priors produced by P-NOC to guide the learning to saliency information in a weakly supervised fashion. Finally, we employ both pseudo-segmentation priors and pseudo-saliency proposals in the random walk procedure, resulting in higher quality pseudo-semantic segmentation masks, and competitive results with the state of the art.
Paper Structure (41 sections, 10 equations, 10 figures, 10 tables, 1 algorithm)

This paper contains 41 sections, 10 equations, 10 figures, 10 tables, 1 algorithm.

Figures (10)

  • Figure 1: Semantic segmentation priors devised by different WSSS techniques. From top to bottom: (a) CAM, (b) OC-CSE, and (c) Puzzle-CAM.
  • Figure 2: Diagram of our proposed adversarial learning scheme, NOC-CSE. (a) OC-CSE: class-specific regions found by $f$ are erased (CSE) with the supervision of a fixed ordinary classifier ($oc$). After many training iterations, core regions associated with a class are saturated, rendering the $oc$ redundant. (b) NOC-CSE: $f$ learns class-specific regions with the assistance of a not-so-ordinary classifier ($noc$), which is gradually trained to associate secondary (and yet) discriminant regions to the class of interest.
  • Figure 3: Detailing of all training objectives in P-NOC. In each training step, a sample $x_i$ is presented to $f$, which is optimized to produce attention maps considering the regularization provided by both Puzzle module and a "not-so-ordinary" classifier $noc$. $f$ is then fixed and $noc$ is refined to shift its attention to secondary discriminative regions, currently neglected by both networks.
  • Figure 4: Example of hints (colored regions) used to train C²AM-H. Hints were extracted from segmentation priors produced by a P-NOC model, considering regions where activation intensity were lower than $\delta_\text{bg}=0.1$ and higher than $\delta_\text{fg}=0.4$.
  • Figure 5: Comparison between the different affinity maps obtained from RS269 trained with P-NOC. From left to right: (a) images and ground-truth segmentation; (b) coarse affinity labels from priors; (c) conventional affinity labels refined with dCRF ahn2018learning; and (d) (our) affinity labels, obtained using both C²AM-H and dCRF.
  • ...and 5 more figures