Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation
Tao Chen, XiRuo Jiang, Gensheng Pei, Zeren Sun, Yucheng Wang, Yazhou Yao
TL;DR
The paper tackles weakly supervised semantic segmentation by improving CAM-based localization, which traditionally suffers from under-activation or over-expansion. It introduces Knowledge Transfer with Simulated Inter-Image Erasing (KTSE), a framework that weakens the anchor activation via inter-image object information and then transfers knowledge to strengthen the less activated localization map, complemented by a Self-Supervised Regularization (SSR) and a Multi-Granularity Alignment (MGA) to stabilize bidirectional learning and refine object boundaries. The method integrates CAM generation with a GPP head, the SIE knowledge-transfer loss, SSR with pseudo-label supervision and inter-class consistency, and MGA with global and local alignments, all trained under a unified objective. Extensive experiments on PASCAL VOC 2012 and COCO demonstrate state-of-the-art or competitive performance, with substantial gains in pseudo-label quality and segmentation maps across backbones, and thorough ablations validate the effectiveness of SIE, SSR, and MGA; code is publicly available.
Abstract
Though adversarial erasing has prevailed in weakly supervised semantic segmentation to help activate integral object regions, existing approaches still suffer from the dilemma of under-activation and over-expansion due to the difficulty in determining when to stop erasing. In this paper, we propose a \textbf{K}nowledge \textbf{T}ransfer with \textbf{S}imulated Inter-Image \textbf{E}rasing (KTSE) approach for weakly supervised semantic segmentation to alleviate the above problem. In contrast to existing erasing-based methods that remove the discriminative part for more object discovery, we propose a simulated inter-image erasing scenario to weaken the original activation by introducing extra object information. Then, object knowledge is transferred from the anchor image to the consequent less activated localization map to strengthen network localization ability. Considering the adopted bidirectional alignment will also weaken the anchor image activation if appropriate constraints are missing, we propose a self-supervised regularization module to maintain the reliable activation in discriminative regions and improve the inter-class object boundary recognition for complex images with multiple categories of objects. In addition, we resort to intra-image erasing and propose a multi-granularity alignment module to gently enlarge the object activation to boost the object knowledge transfer. Extensive experiments and ablation studies on PASCAL VOC 2012 and COCO datasets demonstrate the superiority of our proposed approach. Source codes and models are available at https://github.com/NUST-Machine-Intelligence-Laboratory/KTSE.
