Answer is All You Need: Instruction-following Text Embedding via Answering the Question
Letian Peng, Yuwei Zhang, Zilong Wang, Jayanth Srinivasa, Gaowen Liu, Zihan Wang, Jingbo Shang
TL;DR
This work addresses the need for instruction-aware text embeddings by reframing the instruction as a question about the input and deriving embeddings from the model-generated answers. The authors introduce InBedder, a framework that fine-tunes language models on a suite of abstractive QA datasets, using the paragraph as input, the instruction as the question, and the short answer as the embedding signal. Through instruction-awareness and robustness tests, InBedder demonstrates strong instruction-following behavior across encoder and decoder models and enables interpretable clustering via cluster explanations. While competitive on generic tasks, the method excels in instruction-driven scenarios and offers practical, open-source resources for researchers and practitioners seeking user-oriented retrieval and analysis. The work highlights the potential of answer-based embeddings to capture instruction-specific semantics with efficiency advantages in certain settings.
Abstract
This work aims to build a text embedder that can capture characteristics of texts specified by user instructions. Despite its tremendous potential to deploy user-oriented embeddings, none of previous approaches provides a concrete solution for it. This paper offers a new viewpoint, which treats the instruction as a question about the input text and encodes the expected answers to obtain the representation accordingly. Intuitively, texts with the same (implicit) semantics would share similar answers following the instruction, thus leading to more similar embeddings. Specifically, we propose InBedder that instantiates this embed-via-answering idea by only fine-tuning language models on abstractive question answering tasks. InBedder demonstrates significantly improved instruction-following capabilities according to our proposed instruction awareness tests and instruction robustness tests, when applied to both large language models (LLMs) (e.g., llama-2-7b) and smaller encoder-based LMs (e.g., roberta-large). Additionally, our qualitative analysis of clustering outcomes, achieved by applying different instructions to the same corpus, demonstrates a high degree of interpretability.
