Table of Contents
Fetching ...

RARe: Retrieval Augmented Retrieval with In-Context Examples

Atula Tejaswi, Yoonsang Lee, Sujay Sanghavi, Eunsol Choi

TL;DR

This work investigates in-context learning for encoder-only text retrievers and introduces RARe, which augments the target query with semantically similar in-context exemplars retrieved via BM25. RARe is trained with standard contrastive loss, and its effectiveness is demonstrated across BeIR and the reasoning-oriented RAR-b benchmarks, with notable improvements in $nDCG@10$ and stronger out-of-domain generalization. The authors provide extensive analyses on exemplar quality, quantity, format, and content, showing that semantically relevant in-context examples yield robust gains and offering guidance for future design choices. The approach is validated across both decoder-based and retriever-based backbones, and code and checkpoints are released to facilitate adoption and further study.

Abstract

While in-context learning is well-studied with decoder-only language models (LLMs), its utility for encoder-only models remains underexplored. We study in-context learning for encoder-only models for text retrieval tasks. Can incorporating in-context examples (query-document pairs) to the target query enhance retriever performance? Our approach, RARe, finetunes a pre-trained model with in-context examples whose query is semantically similar to the target query. This approach achieves performance gains of up to +2.72% nDCG across open-domain retrieval datasets (BeIR, RAR-b) compared to using the target query only as an input. In particular, we find RARe exhibits stronger out-of-domain generalization compared to models using queries without in-context examples, similar to what is seen for in-context learning in LLMs. We further provide analysis on the design choices of in-context example augmentation for retrievers and lay the foundation for future work.

RARe: Retrieval Augmented Retrieval with In-Context Examples

TL;DR

This work investigates in-context learning for encoder-only text retrievers and introduces RARe, which augments the target query with semantically similar in-context exemplars retrieved via BM25. RARe is trained with standard contrastive loss, and its effectiveness is demonstrated across BeIR and the reasoning-oriented RAR-b benchmarks, with notable improvements in and stronger out-of-domain generalization. The authors provide extensive analyses on exemplar quality, quantity, format, and content, showing that semantically relevant in-context examples yield robust gains and offering guidance for future design choices. The approach is validated across both decoder-based and retriever-based backbones, and code and checkpoints are released to facilitate adoption and further study.

Abstract

While in-context learning is well-studied with decoder-only language models (LLMs), its utility for encoder-only models remains underexplored. We study in-context learning for encoder-only models for text retrieval tasks. Can incorporating in-context examples (query-document pairs) to the target query enhance retriever performance? Our approach, RARe, finetunes a pre-trained model with in-context examples whose query is semantically similar to the target query. This approach achieves performance gains of up to +2.72% nDCG across open-domain retrieval datasets (BeIR, RAR-b) compared to using the target query only as an input. In particular, we find RARe exhibits stronger out-of-domain generalization compared to models using queries without in-context examples, similar to what is seen for in-context learning in LLMs. We further provide analysis on the design choices of in-context example augmentation for retrievers and lay the foundation for future work.

Paper Structure

This paper contains 48 sections, 6 equations, 6 figures, 19 tables.

Figures (6)

  • Figure 1: Overview. Prior work augments a task-specific instruction to a given query as input to the Retriever. In RARe, we further leverage a set of in-context exemplars that contain pairs of queries and relevant documents. These in-context examples are augmented with the original query as input to the retriever along with the instruction.
  • Figure 2: Inference-only modification does not work. We report performance before and after adding in-context examples to the query without updating model parameters. Embedding models are not able to leverage in-context examples out of the box, as opposed to decoder-only models.
  • Figure 3: Retrieved vs. Random In-context Examples. Change in performance ($\Delta$nDCG@10) on E5-Mistral-Instruct with RARe ($q^\text{RARe}$) from the baseline setting ($q^\text{inst}$ both during training and evaluation time). Using retrieved examples during training and inference enhance model performance in most benchmark datasets.
  • Figure 4: Change in performance ($\Delta$nDCG@10) from the base model (E5-Mistral-Instruct) for varying similarity between the closest in-context example query and target query (Score@Top-1). For RARe, we use retrieved in-context examples $q^{\text{RARe}}$ on the augmented in-context query format $q^{\text{inst+ic}}$.
  • Figure 5: Retrieved vs. Random In-context Examples. Change in performance ($\Delta$nDCG@10) on E5-Mistral-Instruct with RARe ($q^\text{inst+ic}$) from the baseline setting ($q$ both during training and evaluation time). Using retrieved examples during training and/or inference enhance model performance in 7/10 datasets.
  • ...and 1 more figures