Table of Contents
Fetching ...

GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings

Raghuveer Thirukovalluru, Bhuwan Dhingra

TL;DR

GenEOL addresses the challenge of training-free sentence embeddings by leveraging the generative power of LLMs to create $m$ meaning-preserving transformations of each sentence and averaging their embeddings with the original. By combining an instruction-tuned generator with a pretrained embedder using a fixed EOL prompt, GenEOL achieves robust, high-quality representations that surpass prior training-free methods on STS and MTEB benchmarks, and stabilize across LLM layers. Key insights include the importance of diverse transformations, the benefit of compositional summaries, and the resilience of GenEOL to prompt perturbations, with notable gains even at small $m$. The approach demonstrates a practical trade-off between inference-time compute and embedding quality, offering a path toward high-performance, training-free sentence representations that can operate with black-box LLMs, while acknowledging the higher computational cost and potential content hallucinations in generated variants.

Abstract

Training-free embedding methods directly leverage pretrained large language models (LLMs) to embed text, bypassing the costly and complex procedure of contrastive learning. Previous training-free embedding methods have mainly focused on optimizing embedding prompts and have overlooked the benefits of utilizing the generative abilities of LLMs. We propose a novel method, GenEOL, which uses LLMs to generate diverse transformations of a sentence that preserve its meaning, and aggregates the resulting embeddings of these transformations to enhance the overall sentence embedding. GenEOL significantly outperforms the existing training-free embedding methods by an average of 2.85 points across several LLMs on the sentence semantic text similarity (STS) benchmark. GenEOL also achieves notable gains in clustering, reranking, and pair-classification tasks from the MTEB benchmark. Additionally, GenEOL stabilizes representation quality across LLM layers and remains robust to perturbations of embedding prompts.

GenEOL: Harnessing the Generative Power of LLMs for Training-Free Sentence Embeddings

TL;DR

GenEOL addresses the challenge of training-free sentence embeddings by leveraging the generative power of LLMs to create meaning-preserving transformations of each sentence and averaging their embeddings with the original. By combining an instruction-tuned generator with a pretrained embedder using a fixed EOL prompt, GenEOL achieves robust, high-quality representations that surpass prior training-free methods on STS and MTEB benchmarks, and stabilize across LLM layers. Key insights include the importance of diverse transformations, the benefit of compositional summaries, and the resilience of GenEOL to prompt perturbations, with notable gains even at small . The approach demonstrates a practical trade-off between inference-time compute and embedding quality, offering a path toward high-performance, training-free sentence representations that can operate with black-box LLMs, while acknowledging the higher computational cost and potential content hallucinations in generated variants.

Abstract

Training-free embedding methods directly leverage pretrained large language models (LLMs) to embed text, bypassing the costly and complex procedure of contrastive learning. Previous training-free embedding methods have mainly focused on optimizing embedding prompts and have overlooked the benefits of utilizing the generative abilities of LLMs. We propose a novel method, GenEOL, which uses LLMs to generate diverse transformations of a sentence that preserve its meaning, and aggregates the resulting embeddings of these transformations to enhance the overall sentence embedding. GenEOL significantly outperforms the existing training-free embedding methods by an average of 2.85 points across several LLMs on the sentence semantic text similarity (STS) benchmark. GenEOL also achieves notable gains in clustering, reranking, and pair-classification tasks from the MTEB benchmark. Additionally, GenEOL stabilizes representation quality across LLM layers and remains robust to perturbations of embedding prompts.

Paper Structure

This paper contains 30 sections, 4 figures, 14 tables.

Figures (4)

  • Figure 1: GenEOL methodology outlined. Step 1: Generator ($\mathbf{\mathcal{L}_{IT}}$) creates a set of transformed sentences, each conveying the same core meaning as the original sentence. Step 2: Original sentence, transformed sentences are embedded using Embedder ($\mathbf{\mathcal{L}_{PT}}$) and averaged to produce the final embedding. (.) is the count of each element.
  • Figure 2: (a) Comparison of Pretrained vs the Instruction Tuned Models on STSB validation set. Pretrained models are always better in sentence embeddings. (b) Scatter Plot for score ranks on the STSB validation set using EOL based prompt. (ordinal ranks used for better visualization) (c) Scatter Plot for score ranks with GenEOL. The points are more concentrated along the diagonal compared to KEEOL (Note the blue dots away from the diagonal in (c)).
  • Figure 3: Average STS performance with $m$. GenEOL beats MetaEOL with only 2 transformations ($m$=2) for both embedders. Performance starts to stagnate around 16 generations. GenEOL results averaged over 3 seeds.
  • Figure 4: Avg. STS performance when embeddings are extracted from different layers. GenEOL can stabilize the representational quality across layers. Max-Min value across layers is lower for GenEOL compared to KEEOL