SVasP: Self-Versatility Adversarial Style Perturbation for Cross-Domain Few-Shot Learning
Wenqian Li, Pengfei Fang, Hui Xue
TL;DR
This work tackles single-source Cross-Domain Few-Shot Learning (CD-FSL) by addressing gradient instability in style-based domain adaptation methods. It introduces SVasP, a framework that combines crop-localized style gradients with global style perturbations under a self-versatility paradigm, and couples this with a Discrepancy & Consistency Optimization (DCO) objective to maximize seen-unseen visual discrepancy while preserving semantic consistency. Key contributions include the Style-Gradient Generation module, SV Gradient Ensemble, Adversarial Style Perturbation, and the DCO loss, which together flatten the loss landscape and improve transferability across eight target datasets. Empirical results on ResNet-10 and ViT-small backbones show SVasP achieving new state-of-the-art performance in both $5$-way $1$-shot and $5$-way $5$-shot settings, validating its effectiveness in stabilizing gradients and generalizing to unseen domains. The approach offers practical impact for robust cross-domain recognition in data-scarce scenarios and provides a blueprint for leveraging localized gradient information in adversarial style perturbations.
Abstract
Cross-Domain Few-Shot Learning (CD-FSL) aims to transfer knowledge from seen source domains to unseen target domains, which is crucial for evaluating the generalization and robustness of models. Recent studies focus on utilizing visual styles to bridge the domain gap between different domains. However, the serious dilemma of gradient instability and local optimization problem occurs in those style-based CD-FSL methods. This paper addresses these issues and proposes a novel crop-global style perturbation method, called \underline{\textbf{S}}elf-\underline{\textbf{V}}ersatility \underline{\textbf{A}}dversarial \underline{\textbf{S}}tyle \underline{\textbf{P}}erturbation (\textbf{SVasP}), which enhances the gradient stability and escapes from poor sharp minima jointly. Specifically, SVasP simulates more diverse potential target domain adversarial styles via diversifying input patterns and aggregating localized crop style gradients, to serve as global style perturbation stabilizers within one image, a concept we refer to as self-versatility. Then a novel objective function is proposed to maximize visual discrepancy while maintaining semantic consistency between global, crop, and adversarial features. Having the stabilized global style perturbation in the training phase, one can obtain a flattened minima in the loss landscape, boosting the transferability of the model to the target domains. Extensive experiments on multiple benchmark datasets demonstrate that our method significantly outperforms existing state-of-the-art methods. Our codes are available at https://github.com/liwenqianSEU/SVasP.
