Table of Contents
Fetching ...

Enhancing Domain Adaptation through Prompt Gradient Alignment

Hoang Phan, Lam Tran, Quyen Tran, Trung Le

TL;DR

This work reframes unsupervised domain adaptation as a multi-objective optimization over domain-specific losses in a vision-language prompting setup. It introduces Prompt Gradient Alignment (PGA) and its multi-source variant (MPGA), which align per-objective gradients in the prompt space while penalizing gradient norms to improve generalization. The method leverages CLIP-based prompts to tune only a small set of parameters, achieving strong, parameter-efficient adaptation and outperforming existing prompt-based and non-prompt methods on standard benchmarks. A theoretical generalization bound is provided to motivate the gradient-alignment and norm-penalization components, and extensive experiments demonstrate robust performance across single- and multi-source UDA scenarios with reduced compute compared to competing approaches.

Abstract

Prior Unsupervised Domain Adaptation (UDA) methods often aim to train a domain-invariant feature extractor, which may hinder the model from learning sufficiently discriminative features. To tackle this, a line of works based on prompt learning leverages the power of large-scale pre-trained vision-language models to learn both domain-invariant and specific features through a set of domain-agnostic and domain-specific learnable prompts. Those studies typically enforce invariant constraints on representation, output, or prompt space to learn such prompts. In contrast, we cast UDA as a multiple-objective optimization problem in which each objective is represented by a domain loss. Under this new framework, we propose to align per-objective gradients to foster consensus between them. Additionally, to prevent potential overfitting when fine-tuning this deep learning architecture, we penalize the norm of these gradients. To achieve these goals, we devise a practical gradient update procedure that can work under both single-source and multi-source UDA. Empirically, our method consistently outperforms other vision-language model adaptation methods. The implementation is available at https://github.com/VietHoang1512/PGA.

Enhancing Domain Adaptation through Prompt Gradient Alignment

TL;DR

This work reframes unsupervised domain adaptation as a multi-objective optimization over domain-specific losses in a vision-language prompting setup. It introduces Prompt Gradient Alignment (PGA) and its multi-source variant (MPGA), which align per-objective gradients in the prompt space while penalizing gradient norms to improve generalization. The method leverages CLIP-based prompts to tune only a small set of parameters, achieving strong, parameter-efficient adaptation and outperforming existing prompt-based and non-prompt methods on standard benchmarks. A theoretical generalization bound is provided to motivate the gradient-alignment and norm-penalization components, and extensive experiments demonstrate robust performance across single- and multi-source UDA scenarios with reduced compute compared to competing approaches.

Abstract

Prior Unsupervised Domain Adaptation (UDA) methods often aim to train a domain-invariant feature extractor, which may hinder the model from learning sufficiently discriminative features. To tackle this, a line of works based on prompt learning leverages the power of large-scale pre-trained vision-language models to learn both domain-invariant and specific features through a set of domain-agnostic and domain-specific learnable prompts. Those studies typically enforce invariant constraints on representation, output, or prompt space to learn such prompts. In contrast, we cast UDA as a multiple-objective optimization problem in which each objective is represented by a domain loss. Under this new framework, we propose to align per-objective gradients to foster consensus between them. Additionally, to prevent potential overfitting when fine-tuning this deep learning architecture, we penalize the norm of these gradients. To achieve these goals, we devise a practical gradient update procedure that can work under both single-source and multi-source UDA. Empirically, our method consistently outperforms other vision-language model adaptation methods. The implementation is available at https://github.com/VietHoang1512/PGA.
Paper Structure (32 sections, 3 theorems, 29 equations, 6 figures, 11 tables, 1 algorithm)

This paper contains 32 sections, 3 theorems, 29 equations, 6 figures, 11 tables, 1 algorithm.

Key Result

Theorem 4.1

Under the assumption R-subgaussianity, the generalization error can be upper-bounded by: where $\mathcal{T}$ is the total number of training iterations, $\tilde{\eta_t}$ is the learning rate at iteration $t$ scaled by a scalar, $\boldsymbol{g}^{src}_t = \nabla_{\boldsymbol{P}}\mathcal{L}_{S}(\boldsymbol{P}_{t-1})$, $\boldsymbol{g}^{tgt}_t = \nabla_{\boldsymbol{P}}\mathcal{L}_{T}(\boldsy

Figures (6)

  • Figure 1: Baselines performance on Office-Home
  • Figure 2: Performance of ERM and PGA on the in-domain data (validation set) and out-of-distribution data (test set). Average results and shaded standard errors are obtained from $10$ random seeds.
  • Figure 3: Evolution of the gradient similarity during training.
  • Figure 4: ZDT-1 task-specific gradient directions at different iterations. Red curve represents the Pareto front while the blue and green arrows indicate the updating directions for minimizing $f_1$ and $f_2$, respectively.
  • Figure 5: Computational complexity: accuracy curve (left), number of trainable parameters (middle), and GPU memory (right).
  • ...and 1 more figures

Theorems & Definitions (9)

  • Theorem 4.1
  • Definition A.2
  • Definition A.3
  • Definition A.4
  • Theorem A.5
  • Remark A.6
  • Remark A.7
  • proof
  • Lemma A.8