Learning to Retrieve Iteratively for In-Context Learning

Yunmo Chen; Tongfei Chen; Harsh Jhamtani; Patrick Xia; Richard Shin; Jason Eisner; Benjamin Van Durme

Learning to Retrieve Iteratively for In-Context Learning

Yunmo Chen, Tongfei Chen, Harsh Jhamtani, Patrick Xia, Richard Shin, Jason Eisner, Benjamin Van Durme

TL;DR

The paper tackles the challenge of selecting exemplar portfolios for in-context learning by recognizing that traditional retrievers ignore interactions among exemplars and task-specific LLM behavior. It introduces IterR, a stateful iterative retriever trained with proximal policy optimization, using LLMs as environments to optimize a sequence of exemplars that maximize the LM's likelihood of generating the correct output. The approach adds a modest 4M parameters to an existing dense retriever and shows improved performance on semantic parsing benchmarks CalFlow, TreeDST, and MTOP, with robust generalization across inference LLMs. The work demonstrates that learned, interactive exemplar selection can substantially enhance ICL and suggests practical benefits for few-shot parsing with potentially reduced exemplar counts and cross-model applicability.

Abstract

We introduce iterative retrieval, a novel framework that empowers retrievers to make iterative decisions through policy optimization. Finding an optimal portfolio of retrieved items is a combinatorial optimization problem, generally considered NP-hard. This approach provides a learned approximation to such a solution, meeting specific task requirements under a given family of large language models (LLMs). We propose a training procedure based on reinforcement learning, incorporating feedback from LLMs. We instantiate an iterative retriever for composing in-context learning (ICL) exemplars and apply it to various semantic parsing tasks that demand synthesized programs as outputs. By adding only 4M additional parameters for state encoding, we convert an off-the-shelf dense retriever into a stateful iterative retriever, outperforming previous methods in selecting ICL exemplars on semantic parsing datasets such as CalFlow, TreeDST, and MTOP. Additionally, the trained iterative retriever generalizes across different inference LLMs beyond the one used during training.

Learning to Retrieve Iteratively for In-Context Learning

TL;DR

Abstract

Paper Structure (30 sections, 7 equations, 7 figures, 5 tables)

This paper contains 30 sections, 7 equations, 7 figures, 5 tables.

Introduction
Overview of an Iterative Retriever
Instantiating an Iterative Retriever
Training
Environment Simulator
Reward Design
Policy Optimization
Sampling & Collecting Experience
Experimental Setup
Datasets
Baselines
Generation with LLMs
Hyperparameters
Evaluation Metrics
Results & Analyses
...and 15 more sections

Figures (7)

Figure 1: Above: ICL under a single retriever call. Below: ICL under our proposed iterative retriever.
Figure 2: ICL prompt construction for an example in SMCalFlow. Above: ICL with BM25 as the retriever. Below: An instance of our iterative retriever. BM25 retrieves examples that overlaps lexically with the query, whereas the trained iterative retriever is better at retrieving structurally similar exemplars since it is trained to maximize the probability of the LM generating the reference parse.
Figure 3: Samples of $(x, y)$ pairs for semantic parsing under different datasets used in this paper.
Figure 4: Stratified sampling employed in our approach. Our sampling method retains the top $k/N_s$ samples and split the rest into $(N_s - 1)$ strata to perform stratified sampling. The resulting $k$ samples are renormalized to construct action distribution.
Figure 5: Performance comparisons on using various LLMs for inference (top row: SMCalFlow; mid: TreeDST; bottom: MTOP). Our IterR used in these experiments are trained with Llama-2-7b but performs retrieval of ICL exemplars used on other LLMs.
...and 2 more figures

Learning to Retrieve Iteratively for In-Context Learning

TL;DR

Abstract

Learning to Retrieve Iteratively for In-Context Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (7)