Table of Contents
Fetching ...

Zero-Shot Continuous Prompt Transfer: Generalizing Task Semantics Across Language Models

Zijun Wu, Yongkang Wu, Lili Mou

TL;DR

This work addresses the challenge that continuous prompts tuned on one language model often fail to transfer to others. It proposes an encode-then-search framework that maps source prompts into a shared relative space built on a common anchor vocabulary, then searches the target embedding space to align with that structure, enabling zero-shot transfer. A multi-source extension further improves generalization by aggregating guidance from several source embeddings. Empirical results on the LAMA-TREx factual probing suite show that the proposed method outperforms discretization and neural-projector baselines and can surpass manual prompts, with normalization and anchor-density ablations clarifying when the approach is most effective. Overall, the approach enables small-source prompts to effectively engineer soft prompts for larger, diverse language models, with potential applicability beyond factual probing to broader NLP tasks.

Abstract

Prompt tuning in natural language processing (NLP) has become an increasingly popular method for adapting large language models to specific tasks. However, the transferability of these prompts, especially continuous prompts, between different models remains a challenge. In this work, we propose a zero-shot continuous prompt transfer method, where source prompts are encoded into relative space and the corresponding target prompts are searched for transferring to target models. Experimental results confirm the effectiveness of our method, showing that 'task semantics' in continuous prompts can be generalized across various language models. Moreover, we find that combining 'task semantics' from multiple source models can further enhance the generalizability of transfer.

Zero-Shot Continuous Prompt Transfer: Generalizing Task Semantics Across Language Models

TL;DR

This work addresses the challenge that continuous prompts tuned on one language model often fail to transfer to others. It proposes an encode-then-search framework that maps source prompts into a shared relative space built on a common anchor vocabulary, then searches the target embedding space to align with that structure, enabling zero-shot transfer. A multi-source extension further improves generalization by aggregating guidance from several source embeddings. Empirical results on the LAMA-TREx factual probing suite show that the proposed method outperforms discretization and neural-projector baselines and can surpass manual prompts, with normalization and anchor-density ablations clarifying when the approach is most effective. Overall, the approach enables small-source prompts to effectively engineer soft prompts for larger, diverse language models, with potential applicability beyond factual probing to broader NLP tasks.

Abstract

Prompt tuning in natural language processing (NLP) has become an increasingly popular method for adapting large language models to specific tasks. However, the transferability of these prompts, especially continuous prompts, between different models remains a challenge. In this work, we propose a zero-shot continuous prompt transfer method, where source prompts are encoded into relative space and the corresponding target prompts are searched for transferring to target models. Experimental results confirm the effectiveness of our method, showing that 'task semantics' in continuous prompts can be generalized across various language models. Moreover, we find that combining 'task semantics' from multiple source models can further enhance the generalizability of transfer.
Paper Structure (19 sections, 9 equations, 4 figures, 6 tables)

This paper contains 19 sections, 9 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: (a) The goal of transferring the induced continuous prompts on a source model to a target model. (b) Our proposed method for this transfer in a zero-shot manner, where the target prompts should be aligned with the induced source prompts in the relative space.
  • Figure 2: Validation accuracy vs. matching loss, with the curves showing the performance of various target models.
  • Figure 3: The effect of normalization.
  • Figure 4: The effect of the anchor number and prompt length. Each value (dot) was computed by averaging the accuracy from all source--target combinations.