Table of Contents
Fetching ...

Scaling Sentence Embeddings with Large Language Models

Ting Jiang, Shaohan Huang, Zhongzhi Luan, Deqing Wang, Fuzhen Zhuang

TL;DR

This work investigates deriving high-quality sentence embeddings from autoregressive large language models by using a prompt-based representation (PromptEOL) and leveraging in-context learning with demonstration sets. It demonstrates that in-context learning can match or approach contrastive-learning methods without gradient updates, while model scaling improves transfer-task performance but may hurt some STS tasks. By combining prompting with memory-efficient fine-tuning (QLoRA) on SNLI/MNLI, the approach achieves state-of-the-art STS results with relatively small fine-tuned models and scales well for transfer tasks. The study also provides practical guidance on demonstration design and notes that the largest gains come from larger models, with the code released for reproduction.

Abstract

Large language models (LLMs) have recently garnered significant interest. With in-context learning, LLMs achieve impressive results in various natural language tasks. However, the application of LLMs to sentence embeddings remains an area of ongoing research. In this work, we propose an in-context learning-based method aimed at improving sentence embeddings performance. Our approach involves adapting the previous prompt-based representation method for autoregressive models, constructing a demonstration set that enables LLMs to perform in-context learning, and scaling up the LLMs to different model sizes. Through extensive experiments, in-context learning enables LLMs to generate high-quality sentence embeddings without any fine-tuning. It helps LLMs achieve performance comparable to current contrastive learning methods. By scaling model size, we find scaling to more than tens of billion parameters harms the performance on semantic textual similarity (STS) tasks. However, the largest model outperforms other counterparts and achieves the new state-of-the-art result on transfer tasks. We also fine-tune LLMs with current contrastive learning approach, and the 2.7B OPT model, incorporating our prompt-based method, surpasses the performance of 4.8B ST5, achieving the new state-of-the-art results on STS tasks. Our code is available at https://github.com/kongds/scaling_sentemb.

Scaling Sentence Embeddings with Large Language Models

TL;DR

This work investigates deriving high-quality sentence embeddings from autoregressive large language models by using a prompt-based representation (PromptEOL) and leveraging in-context learning with demonstration sets. It demonstrates that in-context learning can match or approach contrastive-learning methods without gradient updates, while model scaling improves transfer-task performance but may hurt some STS tasks. By combining prompting with memory-efficient fine-tuning (QLoRA) on SNLI/MNLI, the approach achieves state-of-the-art STS results with relatively small fine-tuned models and scales well for transfer tasks. The study also provides practical guidance on demonstration design and notes that the largest gains come from larger models, with the code released for reproduction.

Abstract

Large language models (LLMs) have recently garnered significant interest. With in-context learning, LLMs achieve impressive results in various natural language tasks. However, the application of LLMs to sentence embeddings remains an area of ongoing research. In this work, we propose an in-context learning-based method aimed at improving sentence embeddings performance. Our approach involves adapting the previous prompt-based representation method for autoregressive models, constructing a demonstration set that enables LLMs to perform in-context learning, and scaling up the LLMs to different model sizes. Through extensive experiments, in-context learning enables LLMs to generate high-quality sentence embeddings without any fine-tuning. It helps LLMs achieve performance comparable to current contrastive learning methods. By scaling model size, we find scaling to more than tens of billion parameters harms the performance on semantic textual similarity (STS) tasks. However, the largest model outperforms other counterparts and achieves the new state-of-the-art result on transfer tasks. We also fine-tune LLMs with current contrastive learning approach, and the 2.7B OPT model, incorporating our prompt-based method, surpasses the performance of 4.8B ST5, achieving the new state-of-the-art results on STS tasks. Our code is available at https://github.com/kongds/scaling_sentemb.
Paper Structure (20 sections, 2 equations, 4 figures, 8 tables)

This paper contains 20 sections, 2 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Performances of OPT in STS-B development set with three representation methods. Dash lines represent the results of BERT.
  • Figure 2: An illustration of in-context learning based sentence embeddings. The green sentences denote the demonstration sentence, and the blue words denote the demonstration words. The corresponding color blocks refer to their slots in the template.
  • Figure 3: Distribution of Spearman correlations on the STS-B development set with different in-context learning demonstrations. The red dash line represents the Spearman correlation of the corresponding model without any demonstration. The blue area represents demonstrations that negatively impact the performance, and the percentage refers to the proportion of these demonstrations to the total number of demonstrations.
  • Figure 4: Influence of different sentence representation methods on three settings. "avg." refers to use averaging output tokens as sentence embeddings. "prompt" refers to extract sentence embeddings using the template from jiang2022promptbert . Dash lines represent the results from the base-size BERT.