Answer is All You Need: Instruction-following Text Embedding via Answering the Question

Letian Peng; Yuwei Zhang; Zilong Wang; Jayanth Srinivasa; Gaowen Liu; Zihan Wang; Jingbo Shang

Answer is All You Need: Instruction-following Text Embedding via Answering the Question

Letian Peng, Yuwei Zhang, Zilong Wang, Jayanth Srinivasa, Gaowen Liu, Zihan Wang, Jingbo Shang

TL;DR

This work addresses the need for instruction-aware text embeddings by reframing the instruction as a question about the input and deriving embeddings from the model-generated answers. The authors introduce InBedder, a framework that fine-tunes language models on a suite of abstractive QA datasets, using the paragraph as input, the instruction as the question, and the short answer as the embedding signal. Through instruction-awareness and robustness tests, InBedder demonstrates strong instruction-following behavior across encoder and decoder models and enables interpretable clustering via cluster explanations. While competitive on generic tasks, the method excels in instruction-driven scenarios and offers practical, open-source resources for researchers and practitioners seeking user-oriented retrieval and analysis. The work highlights the potential of answer-based embeddings to capture instruction-specific semantics with efficiency advantages in certain settings.

Abstract

This work aims to build a text embedder that can capture characteristics of texts specified by user instructions. Despite its tremendous potential to deploy user-oriented embeddings, none of previous approaches provides a concrete solution for it. This paper offers a new viewpoint, which treats the instruction as a question about the input text and encodes the expected answers to obtain the representation accordingly. Intuitively, texts with the same (implicit) semantics would share similar answers following the instruction, thus leading to more similar embeddings. Specifically, we propose InBedder that instantiates this embed-via-answering idea by only fine-tuning language models on abstractive question answering tasks. InBedder demonstrates significantly improved instruction-following capabilities according to our proposed instruction awareness tests and instruction robustness tests, when applied to both large language models (LLMs) (e.g., llama-2-7b) and smaller encoder-based LMs (e.g., roberta-large). Additionally, our qualitative analysis of clustering outcomes, achieved by applying different instructions to the same corpus, demonstrates a high degree of interpretability.

Answer is All You Need: Instruction-following Text Embedding via Answering the Question

TL;DR

Abstract

Paper Structure (27 sections, 10 equations, 9 figures, 5 tables)

This paper contains 27 sections, 10 equations, 9 figures, 5 tables.

Introduction
Related Works
Text Embedder
Instruction Tuning
Goal-Driven Clustering
Problem Formulation
Instruction-following Embedder
Instruction Awareness Tests
Instruction Robustness Tests
Methodology
Encoding Methods
Answer Speaks Louder
Answer Brevity Matters
Our InBedder
Experiments
...and 12 more sections

Figures (9)

Figure 1: An example workflow of InBedder. InBedder takes in both user-provided dataset and user-specified instructions to produce personalized clusterings from which the user can extract insights about the dataset.
Figure 2: Instruction awareness tests performance (averaged over 3 datasets) for different encoding methods introduced in Section \ref{['sec:direct_vs_re']} from the last layer. We show two models here llama-2-7b-chat from Huggingface and llama-2-7b-InBedder that is our fine-tuned model from llama-2-7b. $T$ is the decoding temperature while $\mathcal{S}_Y$ is the sample size. Observations: (1) The generation/answer side (i.e., the checkerboard pattern) is more informative than the prompt side (i.e., the dark blue with dotted pattern); and (2) In llama-2-7b-InBedder, 1st-gen seems to significantly outperform others. See analysis of model depth in Figure \ref{['fig:model_depth']}.
Figure 3: Filtered vs. not filtered (i.e., avg-gen on the last layer of each LLM). Observations: filtering hidden states associated with uninformative contents can marginally improve performance.
Figure 4: An example from our training data.
Figure 5: Instruction robustness tests results. Three set of instructions are tested: correct, implicit and incorrect. $\Delta_{ci}$ denotes the separation between mean of correct and incorrect. $\Delta_{ii}$ denotes the separation between mean of implicit and incorrect. InBedder shows better robustness and performance overall. See more datasets in Figure \ref{['fig:prompt_robustness_more']}
...and 4 more figures

Answer is All You Need: Instruction-following Text Embedding via Answering the Question

TL;DR

Abstract

Answer is All You Need: Instruction-following Text Embedding via Answering the Question

Authors

TL;DR

Abstract

Table of Contents

Figures (9)