Table of Contents
Fetching ...

Any-Shift Prompting for Generalization over Distributions

Zehao Xiao, Jiayi Shen, Mohammad Mahdi Derakhshani, Shengcai Liao, Cees G. M. Snoek

TL;DR

The paper tackles prompt-learning generalization under distribution shifts in image-language models like CLIP. It introduces any-shift prompting, a hierarchical probabilistic framework that links training and test distributions via train and test prompts and a transformer-based inference network, and it employs a pseudo-shift training mechanism to generate test-specific prompts in a single forward pass without test-time fine-tuning ($ELBO$). The approach leverages variational inference to encourage informative test prompts and enables predictions by sampling prompts from learned distributions. Empirical results across 23 datasets show robust generalization across covariate, label, conditional, concept, and joint shifts, outperforming or matching existing prompt-learning baselines while avoiding test-time optimization.

Abstract

Image-language models with prompt learning have shown remarkable advances in numerous downstream vision tasks. Nevertheless, conventional prompt learning methods overfit their training distribution and lose the generalization ability on test distributions. To improve generalization across various distribution shifts, we propose any-shift prompting: a general probabilistic inference framework that considers the relationship between training and test distributions during prompt learning. We explicitly connect training and test distributions in the latent space by constructing training and test prompts in a hierarchical architecture. Within this framework, the test prompt exploits the distribution relationships to guide the generalization of the CLIP image-language model from training to any test distribution. To effectively encode the distribution information and their relationships, we further introduce a transformer inference network with a pseudo-shift training mechanism. The network generates the tailored test prompt with both training and test information in a feedforward pass, avoiding extra training costs at test time. Extensive experiments on twenty-three datasets demonstrate the effectiveness of any-shift prompting on the generalization over various distribution shifts.

Any-Shift Prompting for Generalization over Distributions

TL;DR

The paper tackles prompt-learning generalization under distribution shifts in image-language models like CLIP. It introduces any-shift prompting, a hierarchical probabilistic framework that links training and test distributions via train and test prompts and a transformer-based inference network, and it employs a pseudo-shift training mechanism to generate test-specific prompts in a single forward pass without test-time fine-tuning (). The approach leverages variational inference to encourage informative test prompts and enables predictions by sampling prompts from learned distributions. Empirical results across 23 datasets show robust generalization across covariate, label, conditional, concept, and joint shifts, outperforming or matching existing prompt-learning baselines while avoiding test-time optimization.

Abstract

Image-language models with prompt learning have shown remarkable advances in numerous downstream vision tasks. Nevertheless, conventional prompt learning methods overfit their training distribution and lose the generalization ability on test distributions. To improve generalization across various distribution shifts, we propose any-shift prompting: a general probabilistic inference framework that considers the relationship between training and test distributions during prompt learning. We explicitly connect training and test distributions in the latent space by constructing training and test prompts in a hierarchical architecture. Within this framework, the test prompt exploits the distribution relationships to guide the generalization of the CLIP image-language model from training to any test distribution. To effectively encode the distribution information and their relationships, we further introduce a transformer inference network with a pseudo-shift training mechanism. The network generates the tailored test prompt with both training and test information in a feedforward pass, avoiding extra training costs at test time. Extensive experiments on twenty-three datasets demonstrate the effectiveness of any-shift prompting on the generalization over various distribution shifts.
Paper Structure (19 sections, 19 equations, 6 figures, 16 tables)

This paper contains 19 sections, 19 equations, 6 figures, 16 tables.

Figures (6)

  • Figure 1: Any-shift prompting. (a) Various distribution shifts in real-world applications. (b) We propose any-shift prompting that aggregates training and test information for jointly handling individual distribution shifts and their combinations.
  • Figure 2: Graphical model for any-shift prompting. We introduce probabilistic training and test prompts in a hierarchical inference framework to explore distribution relationships.
  • Figure 3: Transformer inference network of the pseudo-test prompt. The prior (a) of the pseudo-test prompt is inferred by aggregating the pseudo-training prompt, a single image, and all class names of the pseudo-test distribution. The posterior (b) is inferred from the shared pseudo-training prompt, a batch of pseudo-test images, and corresponding class names. Therefore, the posterior incorporates more pseudo-test information and relationships and guides the prior to learn the same knowledge by KL divergence. The image and text encoders of CLIP are frozen. Only the shared transformer, pseudo-training prompt distribution, and MLP networks are trainable, saving training costs.
  • Figure 4: Effectiveness of training and test prompts. The test prompt in the proposed any-shift prompting achieves good generalization on both seen and unseen classes, indicating its ability to handle different shifts jointly.
  • Figure 5: Visualization of generalization effect on the image and text features before and after generalization. Different colors denote different classes. The image and text features with the same categories get closer after generalization by our method, leading to more accurate predictions.
  • ...and 1 more figures