Table of Contents
Fetching ...

A Novel Cross-Perturbation for Single Domain Generalization

Dongjia Zhao, Lei Qi, Xiao Shi, Yinghuan Shi, Xin Geng

TL;DR

The paper tackles single-source domain generalization by introducing CPerb, a cross-perturbation framework that combines horizontal perturbations (image-level and feature-level) with vertical multi-route perturbations, together with MixPatch for patch-level feature perturbations. The method expands data diversity and enforces multi-view consistency to learn domain-invariant representations. Extensive experiments across CIFAR-10/100-C, PACS, and large-scale datasets demonstrate SOTA or competitive performance gains, with robust ablations supporting the contribution of each component. The approach offers a practical augmentation strategy for improving generalization to unseen domains and is compatible with ViT architectures, broadening its applicability.

Abstract

Single domain generalization aims to enhance the ability of the model to generalize to unknown domains when trained on a single source domain. However, the limited diversity in the training data hampers the learning of domain-invariant features, resulting in compromised generalization performance. To address this, data perturbation (augmentation) has emerged as a crucial method to increase data diversity. Nevertheless, existing perturbation methods often focus on either image-level or feature-level perturbations independently, neglecting their synergistic effects. To overcome these limitations, we propose CPerb, a simple yet effective cross-perturbation method. Specifically, CPerb utilizes both horizontal and vertical operations. Horizontally, it applies image-level and feature-level perturbations to enhance the diversity of the training data, mitigating the issue of limited diversity in single-source domains. Vertically, it introduces multi-route perturbation to learn domain-invariant features from different perspectives of samples with the same semantic category, thereby enhancing the generalization capability of the model. Additionally, we propose MixPatch, a novel feature-level perturbation method that exploits local image style information to further diversify the training data. Extensive experiments on various benchmark datasets validate the effectiveness of our method.

A Novel Cross-Perturbation for Single Domain Generalization

TL;DR

The paper tackles single-source domain generalization by introducing CPerb, a cross-perturbation framework that combines horizontal perturbations (image-level and feature-level) with vertical multi-route perturbations, together with MixPatch for patch-level feature perturbations. The method expands data diversity and enforces multi-view consistency to learn domain-invariant representations. Extensive experiments across CIFAR-10/100-C, PACS, and large-scale datasets demonstrate SOTA or competitive performance gains, with robust ablations supporting the contribution of each component. The approach offers a practical augmentation strategy for improving generalization to unseen domains and is compatible with ViT architectures, broadening its applicability.

Abstract

Single domain generalization aims to enhance the ability of the model to generalize to unknown domains when trained on a single source domain. However, the limited diversity in the training data hampers the learning of domain-invariant features, resulting in compromised generalization performance. To address this, data perturbation (augmentation) has emerged as a crucial method to increase data diversity. Nevertheless, existing perturbation methods often focus on either image-level or feature-level perturbations independently, neglecting their synergistic effects. To overcome these limitations, we propose CPerb, a simple yet effective cross-perturbation method. Specifically, CPerb utilizes both horizontal and vertical operations. Horizontally, it applies image-level and feature-level perturbations to enhance the diversity of the training data, mitigating the issue of limited diversity in single-source domains. Vertically, it introduces multi-route perturbation to learn domain-invariant features from different perspectives of samples with the same semantic category, thereby enhancing the generalization capability of the model. Additionally, we propose MixPatch, a novel feature-level perturbation method that exploits local image style information to further diversify the training data. Extensive experiments on various benchmark datasets validate the effectiveness of our method.
Paper Structure (33 sections, 10 equations, 11 figures, 12 tables, 1 algorithm)

This paper contains 33 sections, 10 equations, 11 figures, 12 tables, 1 algorithm.

Figures (11)

  • Figure 1: Statistical discrepancies among different perturbation methods. Experiment on 128 images from "art_painting" domain on PACS. "O" and "I" refer to the original image and image-level perturbation, respectively, both processed through the first convolutional layer of ResNet18 pre-trained on ImageNet to obtain output feature maps. "F" and "IF" indicate the original image and image-level perturbation, respectively, where the feature-level perturbation is applied after extracting features from the convolutional layer. Mean and variance ( i.e., statistics) are computed for the feature on each original image (O), image-level perturbation (I), feature-level perturbation (F), or image-feature dual-level perturbation (IF). Thus, for each channel, we can obtain 128 means and variances. Then, we compute the mean ( i.e., the plotted lines) and variance ( i.e., the shaded areas) of 128 means, as shown in (a). Similar to the mean and variance of 128 variances in (b).
  • Figure 2: Overview of our CPerb framework. The framework involves two paths for the same image: one with image-level augmentation and the other without augmentation, resulting in two images with distinct perspectives ($\text{v}^O$ and $\text{v}^I$). These images are fed into two shared-weight networks, one with feature perturbation and the other without, leading to four predictions used for classification and consistency learning.
  • Figure 3: Illustration of the MixPatch framework. Initially, the feature maps are partitioned into patches along the channel dimension. Subsequently, uncertainty estimation is conducted at the patch level within each channel to uncover potential style variations. Finally, style transfer is employed on the rescaled patches within each channel, leading to the generation of the fused feature maps.
  • Figure 4: Statistical discrepancies between Original and MixPatch. Experiment on "art_painting" domain of PACS. "O" represents the output features obtained after passing the original image through the first convolutional layer of ResNet18. "F" corresponds to the output features resulting from applying feature-level perturbation to the original image after feature extraction in the convolutional layer. Mean and variance ( i.e., statistics) are computed for features on the original image (O) and feature perturbation (F). Thus, for each channel, "O" ("F") yield 128 (256) means and variances. Then, we compute the mean ( i.e., the plotted lines) and variance ( i.e., the shaded areas) of 128 (256) means, as shown in (a). Similar to the mean and variance of 128 (256) variances in (b). It is worth noting that "256" means we split two patches for each channel of feature maps.
  • Figure 5: (a) Experimental results of CPerb and SOTA methods in the "CIFAR-10 → CIFAR10-C" and "CIFAR-100 → CIFAR100-C" tasks.
  • ...and 6 more figures