CLIP-Powered Domain Generalization and Domain Adaptation: A Comprehensive Survey
Jindong Li, Yongguang Li, Yali Fu, Jiahong Liu, Yixin Liu, Menglin Yang, Irwin King
TL;DR
This survey surveys the landscape of CLIP-powered domain generalization and domain adaptation, emphasizing CLIP's zero-shot capabilities to operate across unseen domains without extensive retraining. It organizes methods into two core axes: prompt optimization and CLIP-based backbones, and further subdivides by source information availability (SA vs SF) and by source-target relationships (CS/PS/OS/OPS). Key contributions include a comprehensive taxonomy of SA/SF methods across single and multi-source settings, detailed coverage of open-set and few-shot variants, and a synthesis of benchmarks, metrics, and practical challenges. The study identifies gaps, such as open-set multi-source generalization and source-free partial-set adaptation, and proposes future directions to enhance interpretability, robustness, efficiency, and ethical deployment of CLIP-powered DG/DA techniques. Overall, the survey serves as a foundational resource to guide researchers and practitioners in leveraging vision-language models for resilient, cross-domain performance in real-world applications.
Abstract
As machine learning evolves, domain generalization (DG) and domain adaptation (DA) have become crucial for enhancing model robustness across diverse environments. Contrastive Language-Image Pretraining (CLIP) plays a significant role in these tasks, offering powerful zero-shot capabilities that allow models to perform effectively in unseen domains. However, there remains a significant gap in the literature, as no comprehensive survey currently exists that systematically explores the applications of CLIP in DG and DA, highlighting the necessity for this review. This survey presents a comprehensive review of CLIP's applications in DG and DA. In DG, we categorize methods into optimizing prompt learning for task alignment and leveraging CLIP as a backbone for effective feature extraction, both enhancing model adaptability. For DA, we examine both source-available methods utilizing labeled source data and source-free approaches primarily based on target domain data, emphasizing knowledge transfer mechanisms and strategies for improved performance across diverse contexts. Key challenges, including overfitting, domain diversity, and computational efficiency, are addressed, alongside future research opportunities to advance robustness and efficiency in practical applications. By synthesizing existing literature and pinpointing critical gaps, this survey provides valuable insights for researchers and practitioners, proposing directions for effectively leveraging CLIP to enhance methodologies in domain generalization and adaptation. Ultimately, this work aims to foster innovation and collaboration in the quest for more resilient machine learning models that can perform reliably across diverse real-world scenarios. A more up-to-date version of the papers is maintained at: https://github.com/jindongli-Ai/Survey_on_CLIP-Powered_Domain_Generalization_and_Adaptation.
