Table of Contents
Fetching ...

FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation

Judy Hoffman, Dequan Wang, Fisher Yu, Trevor Darrell

TL;DR

This work tackles pixel-level domain shift in semantic segmentation by introducing an unsupervised adaptation framework for FCNs that combines global domain alignment via domain adversarial training with category-specific adaptation using constrained MIL. The method aligns feature distributions across domains and transfers spatial layout from the source to the target without target labels, demonstrated on synthetic-to-real, cross-season, and cross-city tasks, plus the Berkeley Deep Driving Segmentation (BDDS) dataset. Results show consistent improvements over strong baselines, underscoring the importance of addressing pixel-level adaptation for dense prediction under domain shift. The work also provides a large-scale driving dataset to foster future research in real-world domain adaptation.

Abstract

Fully convolutional models for dense prediction have proven successful for a wide range of visual tasks. Such models perform well in a supervised setting, but performance can be surprisingly poor under domain shifts that appear mild to a human observer. For example, training on one city and testing on another in a different geographic region and/or weather condition may result in significantly degraded performance due to pixel-level distribution shift. In this paper, we introduce the first domain adaptive semantic segmentation method, proposing an unsupervised adversarial approach to pixel prediction problems. Our method consists of both global and category specific adaptation techniques. Global domain alignment is performed using a novel semantic segmentation network with fully convolutional domain adversarial learning. This initially adapted space then enables category specific adaptation through a generalization of constrained weak learning, with explicit transfer of the spatial layout from the source to the target domains. Our approach outperforms baselines across different settings on multiple large-scale datasets, including adapting across various real city environments, different synthetic sub-domains, from simulated to real environments, and on a novel large-scale dash-cam dataset.

FCNs in the Wild: Pixel-level Adversarial and Constraint-based Adaptation

TL;DR

This work tackles pixel-level domain shift in semantic segmentation by introducing an unsupervised adaptation framework for FCNs that combines global domain alignment via domain adversarial training with category-specific adaptation using constrained MIL. The method aligns feature distributions across domains and transfers spatial layout from the source to the target without target labels, demonstrated on synthetic-to-real, cross-season, and cross-city tasks, plus the Berkeley Deep Driving Segmentation (BDDS) dataset. Results show consistent improvements over strong baselines, underscoring the importance of addressing pixel-level adaptation for dense prediction under domain shift. The work also provides a large-scale driving dataset to foster future research in real-world domain adaptation.

Abstract

Fully convolutional models for dense prediction have proven successful for a wide range of visual tasks. Such models perform well in a supervised setting, but performance can be surprisingly poor under domain shifts that appear mild to a human observer. For example, training on one city and testing on another in a different geographic region and/or weather condition may result in significantly degraded performance due to pixel-level distribution shift. In this paper, we introduce the first domain adaptive semantic segmentation method, proposing an unsupervised adversarial approach to pixel prediction problems. Our method consists of both global and category specific adaptation techniques. Global domain alignment is performed using a novel semantic segmentation network with fully convolutional domain adversarial learning. This initially adapted space then enables category specific adaptation through a generalization of constrained weak learning, with explicit transfer of the spatial layout from the source to the target domains. Our approach outperforms baselines across different settings on multiple large-scale datasets, including adapting across various real city environments, different synthetic sub-domains, from simulated to real environments, and on a novel large-scale dash-cam dataset.

Paper Structure

This paper contains 15 sections, 5 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Unsupervised domain adaptation for pixel-level semantic segmentation.
  • Figure 2: Overview of our pixel-level adversarial and constraint-based adaptation.
  • Figure 3: Qualitative results on adaptation from cities in SYNTHIA fall to cities in SYNTHIA winter.
  • Figure 4: Qualitative results on adaptation from cities in Cityscapes to cities in BDDS.