Table of Contents
Fetching ...

Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation

Tao Chen, XiRuo Jiang, Gensheng Pei, Zeren Sun, Yucheng Wang, Yazhou Yao

TL;DR

The paper tackles weakly supervised semantic segmentation by improving CAM-based localization, which traditionally suffers from under-activation or over-expansion. It introduces Knowledge Transfer with Simulated Inter-Image Erasing (KTSE), a framework that weakens the anchor activation via inter-image object information and then transfers knowledge to strengthen the less activated localization map, complemented by a Self-Supervised Regularization (SSR) and a Multi-Granularity Alignment (MGA) to stabilize bidirectional learning and refine object boundaries. The method integrates CAM generation with a GPP head, the SIE knowledge-transfer loss, SSR with pseudo-label supervision and inter-class consistency, and MGA with global and local alignments, all trained under a unified objective. Extensive experiments on PASCAL VOC 2012 and COCO demonstrate state-of-the-art or competitive performance, with substantial gains in pseudo-label quality and segmentation maps across backbones, and thorough ablations validate the effectiveness of SIE, SSR, and MGA; code is publicly available.

Abstract

Though adversarial erasing has prevailed in weakly supervised semantic segmentation to help activate integral object regions, existing approaches still suffer from the dilemma of under-activation and over-expansion due to the difficulty in determining when to stop erasing. In this paper, we propose a \textbf{K}nowledge \textbf{T}ransfer with \textbf{S}imulated Inter-Image \textbf{E}rasing (KTSE) approach for weakly supervised semantic segmentation to alleviate the above problem. In contrast to existing erasing-based methods that remove the discriminative part for more object discovery, we propose a simulated inter-image erasing scenario to weaken the original activation by introducing extra object information. Then, object knowledge is transferred from the anchor image to the consequent less activated localization map to strengthen network localization ability. Considering the adopted bidirectional alignment will also weaken the anchor image activation if appropriate constraints are missing, we propose a self-supervised regularization module to maintain the reliable activation in discriminative regions and improve the inter-class object boundary recognition for complex images with multiple categories of objects. In addition, we resort to intra-image erasing and propose a multi-granularity alignment module to gently enlarge the object activation to boost the object knowledge transfer. Extensive experiments and ablation studies on PASCAL VOC 2012 and COCO datasets demonstrate the superiority of our proposed approach. Source codes and models are available at https://github.com/NUST-Machine-Intelligence-Laboratory/KTSE.

Knowledge Transfer with Simulated Inter-Image Erasing for Weakly Supervised Semantic Segmentation

TL;DR

The paper tackles weakly supervised semantic segmentation by improving CAM-based localization, which traditionally suffers from under-activation or over-expansion. It introduces Knowledge Transfer with Simulated Inter-Image Erasing (KTSE), a framework that weakens the anchor activation via inter-image object information and then transfers knowledge to strengthen the less activated localization map, complemented by a Self-Supervised Regularization (SSR) and a Multi-Granularity Alignment (MGA) to stabilize bidirectional learning and refine object boundaries. The method integrates CAM generation with a GPP head, the SIE knowledge-transfer loss, SSR with pseudo-label supervision and inter-class consistency, and MGA with global and local alignments, all trained under a unified objective. Extensive experiments on PASCAL VOC 2012 and COCO demonstrate state-of-the-art or competitive performance, with substantial gains in pseudo-label quality and segmentation maps across backbones, and thorough ablations validate the effectiveness of SIE, SSR, and MGA; code is publicly available.

Abstract

Though adversarial erasing has prevailed in weakly supervised semantic segmentation to help activate integral object regions, existing approaches still suffer from the dilemma of under-activation and over-expansion due to the difficulty in determining when to stop erasing. In this paper, we propose a \textbf{K}nowledge \textbf{T}ransfer with \textbf{S}imulated Inter-Image \textbf{E}rasing (KTSE) approach for weakly supervised semantic segmentation to alleviate the above problem. In contrast to existing erasing-based methods that remove the discriminative part for more object discovery, we propose a simulated inter-image erasing scenario to weaken the original activation by introducing extra object information. Then, object knowledge is transferred from the anchor image to the consequent less activated localization map to strengthen network localization ability. Considering the adopted bidirectional alignment will also weaken the anchor image activation if appropriate constraints are missing, we propose a self-supervised regularization module to maintain the reliable activation in discriminative regions and improve the inter-class object boundary recognition for complex images with multiple categories of objects. In addition, we resort to intra-image erasing and propose a multi-granularity alignment module to gently enlarge the object activation to boost the object knowledge transfer. Extensive experiments and ablation studies on PASCAL VOC 2012 and COCO datasets demonstrate the superiority of our proposed approach. Source codes and models are available at https://github.com/NUST-Machine-Intelligence-Laboratory/KTSE.
Paper Structure (15 sections, 10 equations, 3 figures, 7 tables)

This paper contains 15 sections, 10 equations, 3 figures, 7 tables.

Figures (3)

  • Figure 1: (a) Previous adversarial erasing-based approaches typically suffer from the over-expansion problem, which is hard to constrain. (b) Different from their information removal strategy, we propose to add extra object knowledge from a paired image to weaken the current object activation. The localization ability of the network is then enhanced by improving the consequent less activated attention map through learning from the object knowledge of the anchor branch. (C) Result comparison.
  • Figure 2: The architecture of our proposed approach. We propose a simulated inter-image erasing (SIE) scenario where extra object information is introduced from another paired image. We then strengthen the object localization ability of the network by improving the consequent less activated localization map through learning object knowledge from the anchor image. A self-supervised regularization (SSR) module is also proposed to avoid weakening the anchor activation due to bidirectional alignment and improve the inter-class object boundary recognition for complex images. In addition, we propose a multi-granularity alignment (MGA) module to gently enlarge the object activation to further boost the object knowledge transfer.
  • Figure 3: Example localization maps on the PASCAL VOC 2012 training set. For each (a) image, we show (b) ground truth, localization maps produced by (c) previous work of AEFT yoon2022adversarial, (d) our baseline, (e) baseline + SIE, (f) baseline + SIE + SSR, and (g) baseline + SIE + SSR + MGA. Best viewed in color.