Table of Contents
Fetching ...

OSSA: Unsupervised One-Shot Style Adaptation

Robin Gerster, Holger Caesar, Matthias Rapp, Alexander Wolpert, Michael Teutsch

TL;DR

One-Shot Style Adaptation (OSSA), a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style, is introduced.

Abstract

Despite their success in various vision tasks, deep neural network architectures often underperform in out-of-distribution scenarios due to the difference between training and target domain style. To address this limitation, we introduce One-Shot Style Adaptation (OSSA), a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Specifically, OSSA generates diverse target styles by perturbing the style statistics derived from a single target image and then applies these styles to a labeled source dataset at the feature level using Adaptive Instance Normalization (AdaIN). Extensive experiments show that OSSA establishes a new state-of-the-art among one-shot domain adaptation methods by a significant margin, and in some cases, even outperforms strong baselines that use thousands of unlabeled target images. By applying OSSA in various scenarios, including weather, simulated-to-real (sim2real), and visual-to-thermal adaptations, our study explores the overarching significance of the style gap in these contexts. OSSA's simplicity and efficiency allow easy integration into existing frameworks, providing a potentially viable solution for practical applications with limited data availability. Code is available at https://github.com/RobinGerster7/OSSA

OSSA: Unsupervised One-Shot Style Adaptation

TL;DR

One-Shot Style Adaptation (OSSA), a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style, is introduced.

Abstract

Despite their success in various vision tasks, deep neural network architectures often underperform in out-of-distribution scenarios due to the difference between training and target domain style. To address this limitation, we introduce One-Shot Style Adaptation (OSSA), a novel unsupervised domain adaptation method for object detection that utilizes a single, unlabeled target image to approximate the target domain style. Specifically, OSSA generates diverse target styles by perturbing the style statistics derived from a single target image and then applies these styles to a labeled source dataset at the feature level using Adaptive Instance Normalization (AdaIN). Extensive experiments show that OSSA establishes a new state-of-the-art among one-shot domain adaptation methods by a significant margin, and in some cases, even outperforms strong baselines that use thousands of unlabeled target images. By applying OSSA in various scenarios, including weather, simulated-to-real (sim2real), and visual-to-thermal adaptations, our study explores the overarching significance of the style gap in these contexts. OSSA's simplicity and efficiency allow easy integration into existing frameworks, providing a potentially viable solution for practical applications with limited data availability. Code is available at https://github.com/RobinGerster7/OSSA
Paper Structure (15 sections, 4 equations, 5 figures, 3 tables)

This paper contains 15 sections, 4 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: A qualitative comparison of the baseline Faster R-CNN, our proposed OSSA, and the ground truth. We observe that in various scenarios, including weather (top row), sim2real (middle row), and visual-optical to thermal infrared adaptation, OSSA leads to a substantial increase in accurate detections.
  • Figure 2: The histogram displays the mean activations for each channel across two datasets, Cityscapes and Foggy Cityscapes. It clearly shows distinct mean activations in each channel at the first layer of ResNet50, indicating a style gap between the datasets.
  • Figure 3: High-level overview of the OSSA (One-Shot Style Adaptation) pipeline. When OSSA is active (purple paths), the target image style is extracted from the first two layers of the ResNet50 backbone network, then perturbed with multiplicative Gaussian noise $\sim \mathcal{N}(1, 0.75)$. The novel style is then integrated at the feature map level of the source images using AdaIN for training. When OSSA is inactive (orange paths), a standard pipeline is followed; black paths indicate components that are always active. The target style statistics $\mu$ and $\sigma$ only need to be computed once.
  • Figure 4: Analysis of OSSA's performance across different settings, including the influence of channel style prototypes, method application at various network layers, varying noise intensity levels, and augmentation probabilities.
  • Figure 5: Boxplot of the impact of using multiple target images to approximate target channel style statistics. No significant increase in mean average precision is observed when utilizing more than one target image.