Table of Contents
Fetching ...

Dual-Path Stable Soft Prompt Generation for Domain Generalization

Yuedi Zhang, Shuanghao Bai, Wanqi Zhou, Zhirong Luan, Badong Chen

TL;DR

This work introduces negative learning into the prompt generation process and proposes Dual-Path Stable Soft Prompt Generation (DPSPG), a transformer-based framework designed to improve both the stability and generalization of prompts.

Abstract

Domain generalization (DG) aims to learn a model using data from one or multiple related but distinct source domains that can generalize well to unseen out-of-distribution target domains. Inspired by the success of large pre-trained vision-language models (VLMs), prompt tuning has emerged as an effective generalization strategy. However, it often struggles to capture domain-specific features due to its reliance on manually or fixed prompt inputs. Recently, some prompt generation methods have addressed this limitation by dynamically generating instance-specific and domain-specific prompts for each input, enriching domain information and demonstrating potential for enhanced generalization. Through further investigation, we identify a notable issue in existing prompt generation methods: the same input often yields significantly different and suboptimal prompts across different random seeds, a phenomenon we term Prompt Variability. To address this, we introduce negative learning into the prompt generation process and propose Dual-Path Stable Soft Prompt Generation (DPSPG), a transformer-based framework designed to improve both the stability and generalization of prompts. Specifically, DPSPG incorporates a complementary prompt generator to produce negative prompts, thereby reducing the risk of introducing misleading information. Both theoretical and empirical analyses demonstrate that negative learning leads to more robust and effective prompts by increasing the effective margin and reducing the upper bound of the gradient norm. Extensive experiments on five DG benchmark datasets show that DPSPG consistently outperforms state-of-the-art methods while maintaining prompt stability.

Dual-Path Stable Soft Prompt Generation for Domain Generalization

TL;DR

This work introduces negative learning into the prompt generation process and proposes Dual-Path Stable Soft Prompt Generation (DPSPG), a transformer-based framework designed to improve both the stability and generalization of prompts.

Abstract

Domain generalization (DG) aims to learn a model using data from one or multiple related but distinct source domains that can generalize well to unseen out-of-distribution target domains. Inspired by the success of large pre-trained vision-language models (VLMs), prompt tuning has emerged as an effective generalization strategy. However, it often struggles to capture domain-specific features due to its reliance on manually or fixed prompt inputs. Recently, some prompt generation methods have addressed this limitation by dynamically generating instance-specific and domain-specific prompts for each input, enriching domain information and demonstrating potential for enhanced generalization. Through further investigation, we identify a notable issue in existing prompt generation methods: the same input often yields significantly different and suboptimal prompts across different random seeds, a phenomenon we term Prompt Variability. To address this, we introduce negative learning into the prompt generation process and propose Dual-Path Stable Soft Prompt Generation (DPSPG), a transformer-based framework designed to improve both the stability and generalization of prompts. Specifically, DPSPG incorporates a complementary prompt generator to produce negative prompts, thereby reducing the risk of introducing misleading information. Both theoretical and empirical analyses demonstrate that negative learning leads to more robust and effective prompts by increasing the effective margin and reducing the upper bound of the gradient norm. Extensive experiments on five DG benchmark datasets show that DPSPG consistently outperforms state-of-the-art methods while maintaining prompt stability.

Paper Structure

This paper contains 30 sections, 18 equations, 6 figures, 6 tables, 2 algorithms.

Figures (6)

  • Figure 1: Comparison of the inference stage between our proposed DPSPG and SPG bai2024soft. The dual-path strategy in DPSPG enhances the robustness and stability of prompt generation while maintaining domain-specific semantic coherence.
  • Figure 2: Comparison of prompt generation quality between our proposed DPSPG and existing methods, DPL zhang2023domain and SPG bai2024soft. Colored clusters represent the distribution of prompts generated for the photo domain in the PACS test set under different random seeds, while the pentagram denotes the optimal prompt. DPSPG generates prompts that are more consistently clustered around the optimal point, indicating higher generation quality and stronger domain focus.
  • Figure 3: The training process of DPSPG consists of two stages. In the first stage, positive and negative domain prompt labels are learned. In the second stage, positive and negative prompts for images are generated using separate transformer-based prompt generators and are aligned with the corresponding positive and negative prompt labels.
  • Figure 4: Two examples during inference. Compared with SPG, DPSPG enhances its predictive capabilities by incorporating negative learning.
  • Figure 5: Standard deviation of leave-one-domain-out accuracies across five datasets for various CLIP-based prompt learning methods using (a) ResNet-50 and (b) ViT-B/16 backbones. DPSPG consistently exhibits the lowest standard deviation across domains, which has the narrowest interquartile ranges and shortest whiskers, indicating greater generalization stability and robustness of its dual-path prompt generation strategy.
  • ...and 1 more figures