Table of Contents
Fetching ...

Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation

Chang Liu, Giulia Rizzoli, Pietro Zanuttigh, Fu Li, Yi Niu

TL;DR

This work introduces a strategy to select web images which are similar to previously seen examples in the latent space using a Fourier-based domain discriminator, and proposes an effective caption-driven reharsal strategy to preserve previously learnt classes.

Abstract

Current weakly-supervised incremental learning for semantic segmentation (WILSS) approaches only consider replacing pixel-level annotations with image-level labels, while the training images are still from well-designed datasets. In this work, we argue that widely available web images can also be considered for the learning of new classes. To achieve this, firstly we introduce a strategy to select web images which are similar to previously seen examples in the latent space using a Fourier-based domain discriminator. Then, an effective caption-driven reharsal strategy is proposed to preserve previously learnt classes. To our knowledge, this is the first work to rely solely on web images for both the learning of new concepts and the preservation of the already learned ones in WILSS. Experimental results show that the proposed approach can reach state-of-the-art performances without using manually selected and annotated data in the incremental steps.

Learning from the Web: Language Drives Weakly-Supervised Incremental Learning for Semantic Segmentation

TL;DR

This work introduces a strategy to select web images which are similar to previously seen examples in the latent space using a Fourier-based domain discriminator, and proposes an effective caption-driven reharsal strategy to preserve previously learnt classes.

Abstract

Current weakly-supervised incremental learning for semantic segmentation (WILSS) approaches only consider replacing pixel-level annotations with image-level labels, while the training images are still from well-designed datasets. In this work, we argue that widely available web images can also be considered for the learning of new classes. To achieve this, firstly we introduce a strategy to select web images which are similar to previously seen examples in the latent space using a Fourier-based domain discriminator. Then, an effective caption-driven reharsal strategy is proposed to preserve previously learnt classes. To our knowledge, this is the first work to rely solely on web images for both the learning of new concepts and the preservation of the already learned ones in WILSS. Experimental results show that the proposed approach can reach state-of-the-art performances without using manually selected and annotated data in the incremental steps.
Paper Structure (28 sections, 11 equations, 9 figures, 7 tables)

This paper contains 28 sections, 11 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: General overview of the proposed method.
  • Figure 1: Per-task and per-step mIoU for the 10-1 VOC multi-step overlap incremental setting (WEB+WEB).
  • Figure 2: The proposed method employs a web crawler to gather new knowledge and retrieve past data. The new knowledge (\ref{['sec:learn_new']}) is acquired by querying class names and subsequently filtered in the Fourier domain. Simultaneously, image-level labels are provided by a captioning model. On the other hand, the preservation of old knowledge (\ref{['sec:learn_old']}) involves querying captions from previous data and filtering them based on semantic similarity with the regenerated captions.
  • Figure 2: Image-level labels generated from captions for COCO-to-VOC incremental step classes. For each sample we show (from top to bottom) the queried class name, a thumbnail of the image, the generated caption and the final image-level label.
  • Figure 3: Illustration of caption-based filtering approach.
  • ...and 4 more figures