Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation

Zhiwei Yang; Kexue Fu; Minghong Duan; Linhao Qu; Shuo Wang; Zhijian Song

Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation

Zhiwei Yang, Kexue Fu, Minghong Duan, Linhao Qu, Shuo Wang, Zhijian Song

TL;DR

A dual-teacher-single-student architecture is designed and tag-guided contrast is conducted, which guarantees the cor-rectness of knowledge and further facilitate the discrepancy among co-contexts, and streamline the multi-staged WSSS pipeline end-to-end and tackle the co-occurrence problem without external supervision.

Abstract

Weakly supervised semantic segmentation (WSSS) with image-level labels aims to achieve segmentation tasks without dense annotations. However, attributed to the frequent coupling of co-occurring objects and the limited supervision from image-level labels, the challenging co-occurrence problem is widely present and leads to false activation of objects in WSSS. In this work, we devise a 'Separate and Conquer' scheme SeCo to tackle this issue from dimensions of image space and feature space. In the image space, we propose to 'separate' the co-occurring objects with image decomposition by subdividing images into patches. Importantly, we assign each patch a category tag from Class Activation Maps (CAMs), which spatially helps remove the co-context bias and guide the subsequent representation. In the feature space, we propose to 'conquer' the false activation by enhancing semantic representation with multi-granularity knowledge contrast. To this end, a dual-teacher-single-student architecture is designed and tag-guided contrast is conducted, which guarantee the correctness of knowledge and further facilitate the discrepancy among co-contexts. We streamline the multi-staged WSSS pipeline end-to-end and tackle this issue without external supervision. Extensive experiments are conducted, validating the efficiency of our method and the superiority over previous single-staged and even multi-staged competitors on PASCAL VOC and MS COCO. Code is available at https://github.com/zwyang6/SeCo.git.

Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation

TL;DR

Abstract

Paper Structure (20 sections, 9 equations, 6 figures, 5 tables)

This paper contains 20 sections, 9 equations, 6 figures, 5 tables.

Introduction
Related Works
Learning from Local Semantics
Contrastive Learning & Knowledge Distillation
Methodology
Problem Definition
Framework Overview
Image Decomposition
Representation with Category Knowledge
Representation with Patch Semantics
Training Objectives
Experiments and Results
Experimental Settings
Main Results
Ablation Study and Further Analysis
...and 5 more sections

Figures (6)

Figure 1: (a) Co-occurrence issue. Targets marked by stars (horse and boat) are falsely activated. (b) To solve this issue, we propose a single-staged framework SeCo, which acts in a 'separate and conquer' manner that efficiently tackles co-occurrence issue without external supervision. It initially separates spatial con-texts in the image space and then conquers false activation in feature space. (c) The proposed SeCo accurately localizes the co-categories.
Figure 2: (a) Architecture of the proposed SeCo to tackle co-occurrence issue. Specifically, integral images are firstly sent to the global teacher (G-Teacher) to extract the category knowledge and CAMs. Then three types of category tags, i.e., single-category, background and uncertain tags, are generated from CAMs and allocated to patches accordingly. With tags, two views of patches by different augmentations, i.e., weak data augmentation (W.T.) and strong augmentation (S.T.), are sent to the student and local teacher (L-Teacher) branch, respectively. The local teacher stores all the history patches and category tags and generalizes the patch semantic knowledge. Finally, two contrastive losses, ${L}_{{LiG}}$ and ${L}_{{LiL}}$, are conducted to guarantee the decoupling. In addition, CAMs from global teacher are refined as pseudo labels to train the segmentation network. Since the encoders of segmentation model and classification model are shared, our WSSS pipeline can be trained end-to-end. (b) Illustrated essence of the key components in SeCo. More details are introduced in \ref{['sec.3.1']}
Figure 3: Qualitative segmentation results of AFA 26, ToCo 27 and ours on VOC and COCO. SeCo differentiates co-contexts precisely.
Figure 4: CAMs for co-contexts on VOC between SeCo and competitors 2627. SeCo accurately activates the targets (star).
Figure 5: Category representation of SeCo on PASCAL VOC. Left: category prototypes from last $4,000$ iterations are visualized with t-SNE t16. Right: similarity among the category prototypes.
...and 1 more figures

Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation

TL;DR

Abstract

Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation

Authors

TL;DR

Abstract

Table of Contents

Figures (6)