Let LLMs Speak Embedding Languages: Generative Text Embeddings via Iterative Contrastive Refinement
Yu-Che Tsai, Kuan-Yu Chen, Yuan-Chi Li, Yuan-Hao Chen, Ching-Yu Tsai, Shou-De Lin
TL;DR
The paper tackles the limitation of encoder-only LLM embeddings by introducing GIRCSE, a generative embedding framework that uses autoregressive soft-token refinement to progressively distill semantic representations. An Iterative Contrastive Refinement (ICR) objective supervises each generation step with a stepwise contrastive loss and a refinement regularization term, enabling end-to-end differentiable training. Empirical results show GIRCSE achieves strong performance on MTEB and instruction-following benchmarks with only 0.2M training data and exhibits test-time scaling, where longer refinement at inference improves embedding quality. This approach offers a new paradigm where generation drives representation learning, balancing generic tasks and instruction-following while maintaining efficiency through differentiable soft-token generation and caching. The work has practical implications for scalable, semantically rich embeddings capable of leveraging richer, instruction-aware semantics.
Abstract
Existing large language model (LLM)-based embeddings typically adopt an encoder-only paradigm, treating LLMs as static feature extractors and overlooking their core generative strengths. We introduce GIRCSE (Generative Iterative Refinement for Contrastive Sentence Embeddings), a novel framework that leverages autoregressive generation to iteratively refine semantic representations. By producing sequences of soft tokens optimized under contrastive objective, GIRCSE captures latent concepts and implicit semantics that encoder-only methods often miss. To guide this process, we propose an Iterative Contrastive Refinement (ICR) objective that encourages each refinement step to yield better representations. Extensive experiments show that GIRCSE outperforms strong LLM-based embedding baselines on the MTEB benchmark and instruction-following tasks. Moreover, GIRCSE exhibits an emergent test-time scaling property: generating more tokens at inference steadily improves embedding quality. Our results establish generative iterative refinement as a new paradigm for representation learning.
