Table of Contents
Fetching ...

Domain-invariant Prototypes for Semantic Segmentation

Zhengeng Yang, Hongshan Yu, Wei Sun, Li-Cheng, Ajmal Mian

TL;DR

An easy-to-train framework that learns domain-invariant prototypes for domain adaptive semantic segmentation and shows that domain adaptation shares a common character with few-shot learning in that both aim to recognize some types of unseen data with knowledge learned from large amounts of seen data.

Abstract

Deep Learning has greatly advanced the performance of semantic segmentation, however, its success relies on the availability of large amounts of annotated data for training. Hence, many efforts have been devoted to domain adaptive semantic segmentation that focuses on transferring semantic knowledge from a labeled source domain to an unlabeled target domain. Existing self-training methods typically require multiple rounds of training, while another popular framework based on adversarial training is known to be sensitive to hyper-parameters. In this paper, we present an easy-to-train framework that learns domain-invariant prototypes for domain adaptive semantic segmentation. In particular, we show that domain adaptation shares a common character with few-shot learning in that both aim to recognize some types of unseen data with knowledge learned from large amounts of seen data. Thus, we propose a unified framework for domain adaptation and few-shot learning. The core idea is to use the class prototypes extracted from few-shot annotated target images to classify pixels of both source images and target images. Our method involves only one-stage training and does not need to be trained on large-scale un-annotated target images. Moreover, our method can be extended to variants of both domain adaptation and few-shot learning. Experiments on adapting GTA5-to-Cityscapes and SYNTHIA-to-Cityscapes show that our method achieves competitive performance to state-of-the-art.

Domain-invariant Prototypes for Semantic Segmentation

TL;DR

An easy-to-train framework that learns domain-invariant prototypes for domain adaptive semantic segmentation and shows that domain adaptation shares a common character with few-shot learning in that both aim to recognize some types of unseen data with knowledge learned from large amounts of seen data.

Abstract

Deep Learning has greatly advanced the performance of semantic segmentation, however, its success relies on the availability of large amounts of annotated data for training. Hence, many efforts have been devoted to domain adaptive semantic segmentation that focuses on transferring semantic knowledge from a labeled source domain to an unlabeled target domain. Existing self-training methods typically require multiple rounds of training, while another popular framework based on adversarial training is known to be sensitive to hyper-parameters. In this paper, we present an easy-to-train framework that learns domain-invariant prototypes for domain adaptive semantic segmentation. In particular, we show that domain adaptation shares a common character with few-shot learning in that both aim to recognize some types of unseen data with knowledge learned from large amounts of seen data. Thus, we propose a unified framework for domain adaptation and few-shot learning. The core idea is to use the class prototypes extracted from few-shot annotated target images to classify pixels of both source images and target images. Our method involves only one-stage training and does not need to be trained on large-scale un-annotated target images. Moreover, our method can be extended to variants of both domain adaptation and few-shot learning. Experiments on adapting GTA5-to-Cityscapes and SYNTHIA-to-Cityscapes show that our method achieves competitive performance to state-of-the-art.
Paper Structure (27 sections, 7 equations, 8 figures, 3 tables, 2 algorithms)

This paper contains 27 sections, 7 equations, 8 figures, 3 tables, 2 algorithms.

Figures (8)

  • Figure 1: The main hypothesis that motivates our method. The impressive domain adaptation ability of humans (we can still recognize the second row images as tiger) originates from the fact that we learn domain-invariant class prototypes, and then perform object classification through a "similarity comparison" process with respect to the prototypes.
  • Figure 2: The proposed framework compared to two popular frameworks. Self-training and adversarial-training methods generally involve cumbersome training processes. The former usually needs to perform self-training on the entire target dataset for multiple rounds, while the latter needs to optimize multiple adversarial losses which are sensitive to hyper parameters. In contrast, our proposed method learns domain-invariant prototypes with the support of few-shot target annotations. Our method involves only one-stage training and need not to access large-scale un-annotated target images.
  • Figure 3: Relationship between domain adaptation and few-shot learning. Both aim to recognize some unseen data with knowledge learned from a large amount of seen data
  • Figure 4: Overview of our method. We propose to learn domain-invariant prototypes with a recent popular prototype-based few-shot learning framework. During training, we adopt few-shot annotated target images as the support set and treat all source images as the query set. In each training step, our model first embeds the support image and query image into semantic features using a Siamese Network. Then, a masked average pooling (MAP) operation is applied to the feature maps of support image to obtain class prototypes. Finally, predictions over the pixels of query image are obtained by finding the nearest class prototype to each pixel. To extend the prototype-based few-shot segmentation to high-resolution images containing arbitrary number of classes, we propose a support image adaptive training strategy in which the classes to be recognized as well as the number of support samples of them are fully determined by current support image. For segmentation of a test image, we need only perform prototype-based segmentation with all the class prototypes that are pre-computed from these few-shot annotated target images.
  • Figure 5: Architecture of Feature Refinement Module (FRM).
  • ...and 3 more figures