Table of Contents
Fetching ...

BatStyler: Advancing Multi-category Style Generation for Source-free Domain Generalization

Xiusheng Xu, Lei Qi, Jingyang Zhou, Xin Geng

TL;DR

Source-Free Domain Generalization (SFDG) requires generalization to unseen domains without source images. BatStyler introduces two modules—Coarse Semantic Generation (CSG) and Uniform Style Generation (USG)—within a CLIP-based framework to enlarge style diversity in multi-category tasks while respecting semantic structure. CSG reduces the effective semantic constraint by extracting $C$ coarse-grained semantics per cluster, and USG provides $K$ uniformly distributed style templates initialized via neural collapse, enabling parallel training with a fixed classifier. Experiments show BatStyler matches or exceeds state-of-the-art on multi-category benchmarks and remains competitive on less-category datasets, with improved efficiency and data synthesis diversity. The approach hinges on the CLIP joint space, suggesting future work on robustness to vision-language misalignment.

Abstract

Source-Free Domain Generalization (SFDG) aims to develop a model that performs on unseen domains without relying on any source domains. However, the implementation remains constrained due to the unavailability of training data. Research on SFDG focus on knowledge transfer of multi-modal models and style synthesis based on joint space of multiple modalities, thus eliminating the dependency on source domain images. However, existing works primarily work for multi-domain and less-category configuration, but performance on multi-domain and multi-category configuration is relatively poor. In addition, the efficiency of style synthesis also deteriorates in multi-category scenarios. How to efficiently synthesize sufficiently diverse data and apply it to multi-category configuration is a direction with greater practical value. In this paper, we propose a method called BatStyler, which is utilized to improve the capability of style synthesis in multi-category scenarios. BatStyler consists of two modules: Coarse Semantic Generation and Uniform Style Generation modules. The Coarse Semantic Generation module extracts coarse-grained semantics to prevent the compression of space for style diversity learning in multi-category configuration, while the Uniform Style Generation module provides a template of styles that are uniformly distributed in space and implements parallel training. Extensive experiments demonstrate that our method exhibits comparable performance on less-category datasets, while surpassing state-of-the-art methods on multi-category datasets.

BatStyler: Advancing Multi-category Style Generation for Source-free Domain Generalization

TL;DR

Source-Free Domain Generalization (SFDG) requires generalization to unseen domains without source images. BatStyler introduces two modules—Coarse Semantic Generation (CSG) and Uniform Style Generation (USG)—within a CLIP-based framework to enlarge style diversity in multi-category tasks while respecting semantic structure. CSG reduces the effective semantic constraint by extracting coarse-grained semantics per cluster, and USG provides uniformly distributed style templates initialized via neural collapse, enabling parallel training with a fixed classifier. Experiments show BatStyler matches or exceeds state-of-the-art on multi-category benchmarks and remains competitive on less-category datasets, with improved efficiency and data synthesis diversity. The approach hinges on the CLIP joint space, suggesting future work on robustness to vision-language misalignment.

Abstract

Source-Free Domain Generalization (SFDG) aims to develop a model that performs on unseen domains without relying on any source domains. However, the implementation remains constrained due to the unavailability of training data. Research on SFDG focus on knowledge transfer of multi-modal models and style synthesis based on joint space of multiple modalities, thus eliminating the dependency on source domain images. However, existing works primarily work for multi-domain and less-category configuration, but performance on multi-domain and multi-category configuration is relatively poor. In addition, the efficiency of style synthesis also deteriorates in multi-category scenarios. How to efficiently synthesize sufficiently diverse data and apply it to multi-category configuration is a direction with greater practical value. In this paper, we propose a method called BatStyler, which is utilized to improve the capability of style synthesis in multi-category scenarios. BatStyler consists of two modules: Coarse Semantic Generation and Uniform Style Generation modules. The Coarse Semantic Generation module extracts coarse-grained semantics to prevent the compression of space for style diversity learning in multi-category configuration, while the Uniform Style Generation module provides a template of styles that are uniformly distributed in space and implements parallel training. Extensive experiments demonstrate that our method exhibits comparable performance on less-category datasets, while surpassing state-of-the-art methods on multi-category datasets.
Paper Structure (17 sections, 7 equations, 8 figures, 7 tables)

This paper contains 17 sections, 7 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: Average cosine similarity ($\downarrow$) of synthetic styles. Comparison of PromptStyler and BatStyler on three models: ResNet-50, ViT-B/16 and ViT-L/14. We randomly sample 5, 200, 400, 600, 800 and 1000 category names from ImageNet-S. The number of style words is 80 and text features are obtained from prompt ( e.g. "a S style of a") through text encoder.
  • Figure 2: Training time ($\downarrow$) of first training stage. The experimental configuration adheres to the identical setup as described in Fig. \ref{['fig:1']}
  • Figure 3: Overview of BatStyler. A Coarse Semantic Generation module is used to extract coarse-grained semantics of downstream categories. A classifier initialized by neural collapse is used to perform parallel training and generate styles that are more uniform distributed, which produces a better style diversity and higher training efficiency. For semantic consistency, we employ the extracted Coarse-grained Semantic Set (CSS) to ensure semantic consistency.
  • Figure 4: The overview of coarse semantic generation module (CSG). Here we take the categories of task as input data, and extract coarse-grained semantics. Applying KMeans++ ArthurV07 to text features of all categories to perform clustering (Left), then instruct LLM to perform semantic extraction on the texts within each cluster, extracting the common coarse-grained categories to which each cluster belongs (Right).
  • Figure 5: t-SNE van2008visualizing visualization result. The style-content features visualized by PromptStyler (Left) and BatStyler (Right) on four randomly selected categories from ImageNet-S. Different colors represent different categories.
  • ...and 3 more figures