Table of Contents
Fetching ...

Meta-Task Prompting Elicits Embeddings from Large Language Models

Yibin Lei, Di Wu, Tianyi Zhou, Tao Shen, Yu Cao, Chongyang Tao, Andrew Yates

TL;DR

This work presents MetaEOL, an unsupervised approach to generate high-quality sentence embeddings from large language models without fine-tuning by using meta-task prompting to elicit multiple representations. By constructing task-specific templates for four meta-tasks and averaging the resulting last-token embeddings, MetaEOL achieves competitive STS performance and strong transfer-task results, often outperforming non-training baselines and approaching or surpassing some training-based methods. The findings suggest a scaling behavior where larger models and careful layer selection further enhance embeddings, and they highlight the value of diverse representational perspectives in obtaining robust, general-purpose sentence embeddings. Overall, MetaEOL offers a resource-efficient, versatile embedding strategy that leverages prompt design and model scale to generalize across tasks without explicit training.

Abstract

We introduce a new unsupervised text embedding method, Meta-Task Prompting with Explicit One-Word Limitation (MetaEOL), for generating high-quality sentence embeddings from Large Language Models (LLMs) without the need for model fine-tuning. Leveraging meta-task prompting, MetaEOL guides LLMs to produce embeddings through a series of carefully designed prompts that address multiple representational aspects. Our comprehensive experiments demonstrate that embeddings averaged from various meta-tasks are versatile embeddings that yield competitive performance on Semantic Textual Similarity (STS) benchmarks and excel in downstream tasks, surpassing contrastive-trained models. Our findings suggest a new scaling law, offering a versatile and resource-efficient approach for embedding generation across diverse scenarios.

Meta-Task Prompting Elicits Embeddings from Large Language Models

TL;DR

This work presents MetaEOL, an unsupervised approach to generate high-quality sentence embeddings from large language models without fine-tuning by using meta-task prompting to elicit multiple representations. By constructing task-specific templates for four meta-tasks and averaging the resulting last-token embeddings, MetaEOL achieves competitive STS performance and strong transfer-task results, often outperforming non-training baselines and approaching or surpassing some training-based methods. The findings suggest a scaling behavior where larger models and careful layer selection further enhance embeddings, and they highlight the value of diverse representational perspectives in obtaining robust, general-purpose sentence embeddings. Overall, MetaEOL offers a resource-efficient, versatile embedding strategy that leverages prompt design and model scale to generalize across tasks without explicit training.

Abstract

We introduce a new unsupervised text embedding method, Meta-Task Prompting with Explicit One-Word Limitation (MetaEOL), for generating high-quality sentence embeddings from Large Language Models (LLMs) without the need for model fine-tuning. Leveraging meta-task prompting, MetaEOL guides LLMs to produce embeddings through a series of carefully designed prompts that address multiple representational aspects. Our comprehensive experiments demonstrate that embeddings averaged from various meta-tasks are versatile embeddings that yield competitive performance on Semantic Textual Similarity (STS) benchmarks and excel in downstream tasks, surpassing contrastive-trained models. Our findings suggest a new scaling law, offering a versatile and resource-efficient approach for embedding generation across diverse scenarios.
Paper Structure (36 sections, 4 figures, 8 tables)

This paper contains 36 sections, 4 figures, 8 tables.

Figures (4)

  • Figure 1: The highest decoding probabilities are largely allocated to stop words that carry little useful information when conducting a meaning compression prompting, even if employing a constraint of "in one word" following jiang2023scaling. Although the general semantic, movie, is contained, other aspects of this sentence are missing, like sentiments.
  • Figure 2: The workflow of our method (MetaEOL). We use the prompt in Appendix \ref{['sec:template_generate']} to prompt ChatGPT-4 to generate templates. Each input sentence will be decorated with multiple task-specific templates, indicating its various intended usage scenarios. The resulting multiple prompts will be fed to LLMs. Then, multiple task-specific embeddings will be extracted. The final sentence embedding is obtained by averaging the task-specific embeddings.
  • Figure 3: Influence of number of prompts on LLAMA2-7B. STS Avg. refers to the average score of the seven STS tasks.
  • Figure 4: Influence of output layer index. STS Avg. refers to the average score of the seven STS tasks.