DomainVerse: A Benchmark Towards Real-World Distribution Shifts For Tuning-Free Adaptive Domain Generalization
Feng Hou, Jin Yuan, Ying Yang, Yang Liu, Yang Zhang, Cheng Zhong, Zhongchao Shi, Jianping Fan, Yong Rui, Zhiqiang He
TL;DR
This work tackles real-world distribution shifts by introducing DomainVerse, a large-scale synthetic benchmark with hierarchical, decomposable domain shifts across 390 fine-grained combinations and 18 coarse domains, generated in a Unity-based environment. It reframes domain generalization as Adaptive Domain Generalization (ADG) for vision-language models and proposes tuning-free methods Domain CLIP and Domain++ CLIP that inject domain priors into prompts, eliminating costly fine-tuning. Across tuning-free, test-time adaptation, traditional DG benchmarks, and synthetic-to-real transfer to DWild, the proposed methods consistently improve over zero-shot CLIP and post-processing baselines, achieving SOTA-like performance on DomainVerse and competitive gains on PACS and Office-Home. The results demonstrate the practicality of using domain-aware prompts and LLM-generated domain descriptors to bridge real-world distribution gaps in zero-shot and test-time settings, with DomainVerse serving as a robust evaluation platform for ADG research.
Abstract
Traditional cross-domain tasks, including domain adaptation and domain generalization, rely heavily on training model by source domain data. With the recent advance of vision-language models (VLMs), viewed as natural source models, the cross-domain task changes to directly adapt the pre-trained source model to arbitrary target domains equipped with prior domain knowledge, and we name this task Adaptive Domain Generalization (ADG). However, current cross-domain datasets have many limitations, such as unrealistic domains, unclear domain definitions, and the inability to fine-grained domain decomposition, which drives us to establish a novel dataset DomainVerse for ADG. Benefiting from the introduced hierarchical definition of domain shifts, DomainVerse consists of about 0.5 million images from 390 fine-grained realistic domains. With the help of the constructed DomainVerse and VLMs, we propose two methods called Domain CLIP and Domain++ CLIP for tuning-free adaptive domain generalization. Extensive and comprehensive experiments demonstrate the significance of the dataset and the effectiveness of the proposed methods.
