Table of Contents
Fetching ...

Domain Generalization with MixStyle

Kaiyang Zhou, Yongxin Yang, Yu Qiao, Tao Xiang

TL;DR

MixStyle tackles domain generalization by perturbing the style component of features rather than generating images. It randomly mixes per-instance feature statistics across domains in early CNN layers, generating diverse, pseudo-novel styles during training. The method is simple to implement as a plug-in and shows strong improvements on classification (PACS), cross-dataset person re-ID, and reinforcement learning tasks like CoinRun, often outperforming or matching state-of-the-art DG methods with lower overhead. This work demonstrates the value of feature-level augmentation targeting style statistics for robust generalization across domain shifts.

Abstract

Though convolutional neural networks (CNNs) have demonstrated remarkable ability in learning discriminative features, they often generalize poorly to unseen domains. Domain generalization aims to address this problem by learning from a set of source domains a model that is generalizable to any unseen domain. In this paper, a novel approach is proposed based on probabilistically mixing instance-level feature statistics of training samples across source domains. Our method, termed MixStyle, is motivated by the observation that visual domain is closely related to image style (e.g., photo vs.~sketch images). Such style information is captured by the bottom layers of a CNN where our proposed style-mixing takes place. Mixing styles of training instances results in novel domains being synthesized implicitly, which increase the domain diversity of the source domains, and hence the generalizability of the trained model. MixStyle fits into mini-batch training perfectly and is extremely easy to implement. The effectiveness of MixStyle is demonstrated on a wide range of tasks including category classification, instance retrieval and reinforcement learning.

Domain Generalization with MixStyle

TL;DR

MixStyle tackles domain generalization by perturbing the style component of features rather than generating images. It randomly mixes per-instance feature statistics across domains in early CNN layers, generating diverse, pseudo-novel styles during training. The method is simple to implement as a plug-in and shows strong improvements on classification (PACS), cross-dataset person re-ID, and reinforcement learning tasks like CoinRun, often outperforming or matching state-of-the-art DG methods with lower overhead. This work demonstrates the value of feature-level augmentation targeting style statistics for robust generalization across domain shifts.

Abstract

Though convolutional neural networks (CNNs) have demonstrated remarkable ability in learning discriminative features, they often generalize poorly to unseen domains. Domain generalization aims to address this problem by learning from a set of source domains a model that is generalizable to any unseen domain. In this paper, a novel approach is proposed based on probabilistically mixing instance-level feature statistics of training samples across source domains. Our method, termed MixStyle, is motivated by the observation that visual domain is closely related to image style (e.g., photo vs.~sketch images). Such style information is captured by the bottom layers of a CNN where our proposed style-mixing takes place. Mixing styles of training instances results in novel domains being synthesized implicitly, which increase the domain diversity of the source domains, and hence the generalizability of the trained model. MixStyle fits into mini-batch training perfectly and is extremely easy to implement. The effectiveness of MixStyle is demonstrated on a wide range of tasks including category classification, instance retrieval and reinforcement learning.

Paper Structure

This paper contains 33 sections, 6 equations, 5 figures, 9 tables, 1 algorithm.

Figures (5)

  • Figure 1: 2-D t-SNE tsne visualization of the style statistics (concatenation of mean and standard deviation) computed from the first residual block's feature maps of a ResNet-18 he2016deep trained on four distinct domains li2017deeper. It is clear that different domains are well separated.
  • Figure 2: A graphical illustration of how a reference batch is generated. Domain label is denoted by color.
  • Figure 3: (a) Coinrun benchmark. (b) Test performance in unseen environments. (c) Difference between training and test performance.
  • Figure 4: 2-D visualization of flattened feature maps (top) and the corresponding style statistics (bottom). res1-4 denote the four residual blocks in order in a ResNet architecture. We observe that res1 to res3 contain domain-related information while res4 encodes label-related information.
  • Figure 5: Evaluation on the hyper-parameter $\alpha$ on (a) PACS, (b) person re-ID datasets and (c) Coinrun. In (b), M and D denote Market1501 and Duke respectively.