Making Text Embedders Few-Shot Learners
Chaofan Li, MingHao Qin, Shitao Xiao, Jianlyu Chen, Kun Luo, Yingxia Shao, Defu Lian, Zheng Liu
TL;DR
The paper addresses the challenge of producing adaptable text embeddings with large language models by exploiting in-context learning (ICL) without architecture changes. It introduces bge-en-icl, which injects few-shot task examples into the query side to guide embedding generation, and demonstrates, using the InfoNCE framework with cosine similarity $s(q,p)=\frac{1}{\tau}\cos(h_q,h_p)$ and $\tau=0.02$, that simple ICL-based prompting yields state-of-the-art results on MTEB and AIR-Bench. Through extensive experiments, the authors analyze attention schemas, pooling methods, and passage prompts, finding that preserving the original unidirectional architecture often yields the best embeddings in ICL settings. They also release multilingual variants and a lightweight reranker, underscoring practical impact for retrieval systems and zero-/few-shot generalization in diverse domains. Overall, the work highlights the efficacy of prompt-based ICL for text embeddings and argues that simplicity—leveraging ICL with the standard embedding framework—can surpass more complex architectural changes.
Abstract
Large language models (LLMs) with decoder-only architectures demonstrate remarkable in-context learning (ICL) capabilities. This feature enables them to effectively handle both familiar and novel tasks by utilizing examples provided within their input context. Recognizing the potential of this capability, we propose leveraging the ICL feature in LLMs to enhance the process of text embedding generation. To this end, we introduce a novel model bge-en-icl, which employs few-shot examples to produce high-quality text embeddings. Our approach integrates task-related examples directly into the query side, resulting in significant improvements across various tasks. Additionally, we have investigated how to effectively utilize LLMs as embedding models, including various attention mechanisms, pooling methods, etc. Our findings suggest that retaining the original framework often yields the best results, underscoring that simplicity is best. Experimental results on the MTEB and AIR-Bench benchmarks demonstrate that our approach sets new state-of-the-art (SOTA) performance. Our model, code and dataset are freely available at https://github.com/FlagOpen/FlagEmbedding .
