Table of Contents
Fetching ...

Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models

Bowen Zhang, Kehua Chang, Chunping Li

TL;DR

This work investigates how to obtain high-quality sentence embeddings directly from large pre-trained language models, scrutinizing the commonly used Explicit One-word Limitation (EOL) and revealing its limited utility for discriminative models or fine-tuning. It introduces two prompt-engineering strategies, Pretended Chain of Thought (CoT) and Knowledge Enhancement, as simple, plug-and-play templates that elevate raw embeddings produced by generative PLMs without gradient updates. Through extensive experiments across 7B-scale models and multiple scales (e.g., OPT, LLaMA, Mistral, LLaMA2, and 7B), the methods consistently improve semantic similarity performance on seven STS benchmarks and exhibit favorable memory footprints compared with full or partial fine-tuning. The findings show that these techniques enhance embedding quality by improving alignment and focusing attention on core semantic content, offering practical, scalable gains for retrieval, clustering, and downstream inference tasks; the authors also release their code for reproducibility.

Abstract

Sentence Embedding stands as a fundamental task within the realm of Natural Language Processing, finding extensive application in search engines, expert systems, and question-and-answer platforms. With the continuous evolution of large language models such as LLaMA and Mistral, research on sentence embedding has recently achieved notable breakthroughs. However, these advancements mainly pertain to fine-tuning scenarios, leaving explorations into computationally efficient direct inference methods for sentence representation in a nascent stage. This paper endeavors to bridge this research gap. Through comprehensive experimentation, we challenge the widely held belief in the necessity of an Explicit One-word Limitation for deriving sentence embeddings from Pre-trained Language Models (PLMs). We demonstrate that this approach, while beneficial for generative models under direct inference scenario, is not imperative for discriminative models or the fine-tuning of generative PLMs. This discovery sheds new light on the design of manual templates in future studies. Building upon this insight, we propose two innovative prompt engineering techniques capable of further enhancing the expressive power of PLMs' raw embeddings: Pretended Chain of Thought and Knowledge Enhancement. We confirm their effectiveness across various PLM types and provide a detailed exploration of the underlying factors contributing to their success.

Simple Techniques for Enhancing Sentence Embeddings in Generative Language Models

TL;DR

This work investigates how to obtain high-quality sentence embeddings directly from large pre-trained language models, scrutinizing the commonly used Explicit One-word Limitation (EOL) and revealing its limited utility for discriminative models or fine-tuning. It introduces two prompt-engineering strategies, Pretended Chain of Thought (CoT) and Knowledge Enhancement, as simple, plug-and-play templates that elevate raw embeddings produced by generative PLMs without gradient updates. Through extensive experiments across 7B-scale models and multiple scales (e.g., OPT, LLaMA, Mistral, LLaMA2, and 7B), the methods consistently improve semantic similarity performance on seven STS benchmarks and exhibit favorable memory footprints compared with full or partial fine-tuning. The findings show that these techniques enhance embedding quality by improving alignment and focusing attention on core semantic content, offering practical, scalable gains for retrieval, clustering, and downstream inference tasks; the authors also release their code for reproducibility.

Abstract

Sentence Embedding stands as a fundamental task within the realm of Natural Language Processing, finding extensive application in search engines, expert systems, and question-and-answer platforms. With the continuous evolution of large language models such as LLaMA and Mistral, research on sentence embedding has recently achieved notable breakthroughs. However, these advancements mainly pertain to fine-tuning scenarios, leaving explorations into computationally efficient direct inference methods for sentence representation in a nascent stage. This paper endeavors to bridge this research gap. Through comprehensive experimentation, we challenge the widely held belief in the necessity of an Explicit One-word Limitation for deriving sentence embeddings from Pre-trained Language Models (PLMs). We demonstrate that this approach, while beneficial for generative models under direct inference scenario, is not imperative for discriminative models or the fine-tuning of generative PLMs. This discovery sheds new light on the design of manual templates in future studies. Building upon this insight, we propose two innovative prompt engineering techniques capable of further enhancing the expressive power of PLMs' raw embeddings: Pretended Chain of Thought and Knowledge Enhancement. We confirm their effectiveness across various PLM types and provide a detailed exploration of the underlying factors contributing to their success.
Paper Structure (14 sections, 2 equations, 1 figure, 7 tables)