Table of Contents
Fetching ...

SVasP: Self-Versatility Adversarial Style Perturbation for Cross-Domain Few-Shot Learning

Wenqian Li, Pengfei Fang, Hui Xue

TL;DR

This work tackles single-source Cross-Domain Few-Shot Learning (CD-FSL) by addressing gradient instability in style-based domain adaptation methods. It introduces SVasP, a framework that combines crop-localized style gradients with global style perturbations under a self-versatility paradigm, and couples this with a Discrepancy & Consistency Optimization (DCO) objective to maximize seen-unseen visual discrepancy while preserving semantic consistency. Key contributions include the Style-Gradient Generation module, SV Gradient Ensemble, Adversarial Style Perturbation, and the DCO loss, which together flatten the loss landscape and improve transferability across eight target datasets. Empirical results on ResNet-10 and ViT-small backbones show SVasP achieving new state-of-the-art performance in both $5$-way $1$-shot and $5$-way $5$-shot settings, validating its effectiveness in stabilizing gradients and generalizing to unseen domains. The approach offers practical impact for robust cross-domain recognition in data-scarce scenarios and provides a blueprint for leveraging localized gradient information in adversarial style perturbations.

Abstract

Cross-Domain Few-Shot Learning (CD-FSL) aims to transfer knowledge from seen source domains to unseen target domains, which is crucial for evaluating the generalization and robustness of models. Recent studies focus on utilizing visual styles to bridge the domain gap between different domains. However, the serious dilemma of gradient instability and local optimization problem occurs in those style-based CD-FSL methods. This paper addresses these issues and proposes a novel crop-global style perturbation method, called \underline{\textbf{S}}elf-\underline{\textbf{V}}ersatility \underline{\textbf{A}}dversarial \underline{\textbf{S}}tyle \underline{\textbf{P}}erturbation (\textbf{SVasP}), which enhances the gradient stability and escapes from poor sharp minima jointly. Specifically, SVasP simulates more diverse potential target domain adversarial styles via diversifying input patterns and aggregating localized crop style gradients, to serve as global style perturbation stabilizers within one image, a concept we refer to as self-versatility. Then a novel objective function is proposed to maximize visual discrepancy while maintaining semantic consistency between global, crop, and adversarial features. Having the stabilized global style perturbation in the training phase, one can obtain a flattened minima in the loss landscape, boosting the transferability of the model to the target domains. Extensive experiments on multiple benchmark datasets demonstrate that our method significantly outperforms existing state-of-the-art methods. Our codes are available at https://github.com/liwenqianSEU/SVasP.

SVasP: Self-Versatility Adversarial Style Perturbation for Cross-Domain Few-Shot Learning

TL;DR

This work tackles single-source Cross-Domain Few-Shot Learning (CD-FSL) by addressing gradient instability in style-based domain adaptation methods. It introduces SVasP, a framework that combines crop-localized style gradients with global style perturbations under a self-versatility paradigm, and couples this with a Discrepancy & Consistency Optimization (DCO) objective to maximize seen-unseen visual discrepancy while preserving semantic consistency. Key contributions include the Style-Gradient Generation module, SV Gradient Ensemble, Adversarial Style Perturbation, and the DCO loss, which together flatten the loss landscape and improve transferability across eight target datasets. Empirical results on ResNet-10 and ViT-small backbones show SVasP achieving new state-of-the-art performance in both -way -shot and -way -shot settings, validating its effectiveness in stabilizing gradients and generalizing to unseen domains. The approach offers practical impact for robust cross-domain recognition in data-scarce scenarios and provides a blueprint for leveraging localized gradient information in adversarial style perturbations.

Abstract

Cross-Domain Few-Shot Learning (CD-FSL) aims to transfer knowledge from seen source domains to unseen target domains, which is crucial for evaluating the generalization and robustness of models. Recent studies focus on utilizing visual styles to bridge the domain gap between different domains. However, the serious dilemma of gradient instability and local optimization problem occurs in those style-based CD-FSL methods. This paper addresses these issues and proposes a novel crop-global style perturbation method, called \underline{\textbf{S}}elf-\underline{\textbf{V}}ersatility \underline{\textbf{A}}dversarial \underline{\textbf{S}}tyle \underline{\textbf{P}}erturbation (\textbf{SVasP}), which enhances the gradient stability and escapes from poor sharp minima jointly. Specifically, SVasP simulates more diverse potential target domain adversarial styles via diversifying input patterns and aggregating localized crop style gradients, to serve as global style perturbation stabilizers within one image, a concept we refer to as self-versatility. Then a novel objective function is proposed to maximize visual discrepancy while maintaining semantic consistency between global, crop, and adversarial features. Having the stabilized global style perturbation in the training phase, one can obtain a flattened minima in the loss landscape, boosting the transferability of the model to the target domains. Extensive experiments on multiple benchmark datasets demonstrate that our method significantly outperforms existing state-of-the-art methods. Our codes are available at https://github.com/liwenqianSEU/SVasP.

Paper Structure

This paper contains 40 sections, 38 equations, 11 figures, 8 tables, 1 algorithm.

Figures (11)

  • Figure 1: SVasP stabilizes the gradients and escapes from poor sharp minima. (a) demonstrates the gradient cosine similarity between epochs for displaying ground representations of gradient stability, and the larger the cosine similarity, the more stable gradient update direction. (b) demonstrates the proposed approach ensures that the model converges to a flat minima and is robust to domain shifts.
  • Figure 2: Overview of our proposed methods SVasP. "RB" is an abbreviation for ResNet Block. Random cropping the benign image and generates several crop images. Then, four main modules are performed: a) Generate the gradients of both crop and global styles (illustration with $B_1$); b) Integrate localized crop style gradients into the global style gradients; c) Perform adversarial style perturbation based on AdaIN method; d) DCO: Maximize domain visual discrepancy and global-crop consistency.
  • Figure 3: Performances on different numbers of crops $k$.
  • Figure 4: Performances on (a) different $\xi$, $\lambda$ and (b) whether use same $\kappa_1$, $\kappa_2$. The average accuracy (%) is reported.
  • Figure 5: Performances on different scale parameters $s$.
  • ...and 6 more figures