Table of Contents
Fetching ...

Soft Prompt Generation for Domain Generalization

Shuanghao Bai, Yuedi Zhang, Wanqi Zhou, Zhirong Luan, Badong Chen

TL;DR

This paper tackles domain generalization for vision–language models by replacing fixed prompts with a generative prompt generator. It introduces Soft Prompt Generation (SPG), a two-stage training framework using domain prompt labels and a conditional GAN to produce instance-specific, domain-aware prompts for unseen domains. The approach yields state-of-the-art results across five DG benchmarks and three tasks, outperforming traditional DG methods and CLIP-based prompt-tuning baselines, with extensive ablations and visualizations supporting the method. The work highlights the potential of generative prompt generation to enhance cross-domain transfer and encourages further integration of generative models into prompt learning for VLMs.

Abstract

Large pre-trained vision language models (VLMs) have shown impressive zero-shot ability on downstream tasks with manually designed prompt. To further adapt VLMs to downstream tasks, soft prompt is proposed to replace manually designed prompt, which undergoes fine-tuning based on specific domain data. Prior prompt learning methods primarily learn a fixed prompt or residuled prompt from training samples. However, the learned prompts lack diversity and ignore information about unseen domains. In this paper, we reframe the prompt learning framework from a generative perspective and propose a simple yet efficient method for the Domain Generalization (DG) task, namely Soft Prompt Generation (SPG). Specifically, SPG consists of a two-stage training phase and an inference phase. During the training phase, we introduce soft prompt label for each domain, aiming to incorporate the generative model domain knowledge. During the inference phase, the generator of the generative model is employed to obtain instance-specific soft prompts for the unseen target domain. Extensive experiments on five domain generalization benchmarks of three DG tasks demonstrate that SPG achieves state-of-the-art performance. The code is available at https://github.com/renytek13/Soft-Prompt-Generation-with-CGAN.

Soft Prompt Generation for Domain Generalization

TL;DR

This paper tackles domain generalization for vision–language models by replacing fixed prompts with a generative prompt generator. It introduces Soft Prompt Generation (SPG), a two-stage training framework using domain prompt labels and a conditional GAN to produce instance-specific, domain-aware prompts for unseen domains. The approach yields state-of-the-art results across five DG benchmarks and three tasks, outperforming traditional DG methods and CLIP-based prompt-tuning baselines, with extensive ablations and visualizations supporting the method. The work highlights the potential of generative prompt generation to enhance cross-domain transfer and encourages further integration of generative models into prompt learning for VLMs.

Abstract

Large pre-trained vision language models (VLMs) have shown impressive zero-shot ability on downstream tasks with manually designed prompt. To further adapt VLMs to downstream tasks, soft prompt is proposed to replace manually designed prompt, which undergoes fine-tuning based on specific domain data. Prior prompt learning methods primarily learn a fixed prompt or residuled prompt from training samples. However, the learned prompts lack diversity and ignore information about unseen domains. In this paper, we reframe the prompt learning framework from a generative perspective and propose a simple yet efficient method for the Domain Generalization (DG) task, namely Soft Prompt Generation (SPG). Specifically, SPG consists of a two-stage training phase and an inference phase. During the training phase, we introduce soft prompt label for each domain, aiming to incorporate the generative model domain knowledge. During the inference phase, the generator of the generative model is employed to obtain instance-specific soft prompts for the unseen target domain. Extensive experiments on five domain generalization benchmarks of three DG tasks demonstrate that SPG achieves state-of-the-art performance. The code is available at https://github.com/renytek13/Soft-Prompt-Generation-with-CGAN.
Paper Structure (40 sections, 4 equations, 4 figures, 17 tables, 2 algorithms)

This paper contains 40 sections, 4 equations, 4 figures, 17 tables, 2 algorithms.

Figures (4)

  • Figure 1: The difference between previous work and our work. We reframe the prompt learning framework from a generative perspective. We exclusively rely on a generative model to directly produce soft prompts, ensuring their diversity. Essentially, we transfer the prompt's adaptability to the generative model by incorporating domain knowledge.
  • Figure 2: The design of the second stage of the training phase. The condition generative adversarial net is the backbone of the generative model. The generator is guided by images to produce prompts. Meanwhile, the discriminator evaluates the authenticity of the prompt labels and the generated prompts with image data.
  • Figure 3: The t-sne visualization of the prompt embeddings for CoCoOp, DPL, and our SPG method. Multi-source domain generalization models on 3 tasks of PACS dataset are employed to obtain prompt embeddings. Different colors denote different classes. All the domains including the target domain are highly clustered in SPG.
  • Figure 4: The image-prompt-image retrieval experiment is designed to demonstrate the correlation between the prompts and the images. We present the top-2 results of image retrieval conducted using CoOp, CoCoOp, DPL, and our SPG method. Images encased in red rectangles indicate instances where the query image and the retrieval image belong to the same domain.