Table of Contents
Fetching ...

Effective Structured Prompting by Meta-Learning and Representative Verbalizer

Weisen Jiang, Yu Zhang, James T. Kwok

TL;DR

A prompt pool is used to extract more task knowledge and construct instance-dependent prompts via attention and a novel soft verbalizer (RepVerb) is proposed which constructs label embedding from feature embeddings directly and is parameter-efficient as only the pool is required to be tuned.

Abstract

Prompt tuning for pre-trained masked language models (MLM) has shown promising performance in natural language processing tasks with few labeled examples. It tunes a prompt for the downstream task, and a verbalizer is used to bridge the predicted token and label prediction. Due to the limited training data, prompt initialization is crucial for prompt tuning. Recently, MetaPrompting (Hou et al., 2022) uses meta-learning to learn a shared initialization for all task-specific prompts. However, a single initialization is insufficient to obtain good prompts for all tasks and samples when the tasks are complex. Moreover, MetaPrompting requires tuning the whole MLM, causing a heavy burden on computation and memory as the MLM is usually large. To address these issues, we use a prompt pool to extract more task knowledge and construct instance-dependent prompts via attention. We further propose a novel soft verbalizer (RepVerb) which constructs label embedding from feature embeddings directly. Combining meta-learning the prompt pool and RepVerb, we propose MetaPrompter for effective structured prompting. MetaPrompter is parameter-efficient as only the pool is required to be tuned. Experimental results demonstrate that MetaPrompter performs better than the recent state-of-the-arts and RepVerb outperforms existing soft verbalizers.

Effective Structured Prompting by Meta-Learning and Representative Verbalizer

TL;DR

A prompt pool is used to extract more task knowledge and construct instance-dependent prompts via attention and a novel soft verbalizer (RepVerb) is proposed which constructs label embedding from feature embeddings directly and is parameter-efficient as only the pool is required to be tuned.

Abstract

Prompt tuning for pre-trained masked language models (MLM) has shown promising performance in natural language processing tasks with few labeled examples. It tunes a prompt for the downstream task, and a verbalizer is used to bridge the predicted token and label prediction. Due to the limited training data, prompt initialization is crucial for prompt tuning. Recently, MetaPrompting (Hou et al., 2022) uses meta-learning to learn a shared initialization for all task-specific prompts. However, a single initialization is insufficient to obtain good prompts for all tasks and samples when the tasks are complex. Moreover, MetaPrompting requires tuning the whole MLM, causing a heavy burden on computation and memory as the MLM is usually large. To address these issues, we use a prompt pool to extract more task knowledge and construct instance-dependent prompts via attention. We further propose a novel soft verbalizer (RepVerb) which constructs label embedding from feature embeddings directly. Combining meta-learning the prompt pool and RepVerb, we propose MetaPrompter for effective structured prompting. MetaPrompter is parameter-efficient as only the pool is required to be tuned. Experimental results demonstrate that MetaPrompter performs better than the recent state-of-the-arts and RepVerb outperforms existing soft verbalizers.
Paper Structure (23 sections, 13 equations, 7 figures, 7 tables)

This paper contains 23 sections, 13 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: 5-way 5-shot classification meta-testing accuracy of MetaPrompting with or without MLM tuning on six data sets.
  • Figure 2: t-SNE visualization of [MASK]'s embeddings (crosses) and label embeddings (circles) for a 5-way 5-shot task randomly sampled from Reuters.
  • Figure 3: Distribution of attention weights on 5-way 5-shot classification of Reuters ($15$ topics).
  • Figure 4: Cosine similarities between learned prompt tokens and topic embeddings on 5-way 5-shot classification of Reuters. In the x-axis, $(i,j)$ stands for the $j$th row of ${\boldsymbol \theta}_i$ (i.e., ${\boldsymbol \theta}_i^{(j)}$)
  • Figure 5: Effect of $K$ (in log-scale) on 5-way 5-shot classification ($L_p=8$).
  • ...and 2 more figures