DPStyler: Dynamic PromptStyler for Source-Free Domain Generalization
Yunlong Tang, Yuxuan Wan, Lei Qi, Xin Geng
TL;DR
DPStyler tackles Source-Free Domain Generalization by operating in a large vision-language space (e.g., CLIP) and introducing two key components: a Style Generation Module that dynamically refreshes style prompts every epoch (via Random or StyleMix) and a Style Removal Module (Style-SE Net) that suppresses style information in encoder outputs using a domain-uncertainty loss. To stabilize reliance on text prompts, it employs a Model Ensemble across multiple initial templates during training and inference. The method optimizes a joint objective $L_{total}=L_U+L_C$ with ArcFace-based classification, while keeping the CLIP encoders frozen, and it demonstrates state-of-the-art results on PACS, VLCS, OfficeHome, and DomainNet with reduced training resources compared to PromptStyler. DPStyler further confirms the benefit of style refresh and shows that removing style information improves domain-invariant features, yielding robust performance under both stylized and non-stylized shifts. Overall, the approach offers a practical, one-stage solution for SFDG that leverages prompt-driven style augmentation and explicit style-removal to enhance generalization in real-world settings.
Abstract
Source-Free Domain Generalization (SFDG) aims to develop a model that works for unseen target domains without relying on any source domain. Research in SFDG primarily bulids upon the existing knowledge of large-scale vision-language models and utilizes the pre-trained model's joint vision-language space to simulate style transfer across domains, thus eliminating the dependency on source domain images. However, how to efficiently simulate rich and diverse styles using text prompts, and how to extract domain-invariant information useful for classification from features that contain both semantic and style information after the encoder, are directions that merit improvement. In this paper, we introduce Dynamic PromptStyler (DPStyler), comprising Style Generation and Style Removal modules to address these issues. The Style Generation module refreshes all styles at every training epoch, while the Style Removal module eliminates variations in the encoder's output features caused by input styles. Moreover, since the Style Generation module, responsible for generating style word vectors using random sampling or style mixing, makes the model sensitive to input text prompts, we introduce a model ensemble method to mitigate this sensitivity. Extensive experiments demonstrate that our framework outperforms state-of-the-art methods on benchmark datasets.
