Table of Contents
Fetching ...

RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning

Alexander Scarlatos, Andrew Lan

TL;DR

RetICL addresses the sensitivity of in-context learning to example selection by modeling sequential retrieval of in-context examples as a Markov decision process and training a retriever with reinforcement learning. The approach uses an LSTM-based latent state and a bilinear policy over a S-BERT embedding space, guided by a novel reward that blends task performance with model confidence. Empirical results across TabMWP, GSM8K, and QASC show RetICL consistently matches or surpasses heuristic and learnable baselines, with ablations confirming the value of the confidence reward, temporal conditioning, and exploration. Qualitative analyses demonstrate that RetICL learns representations of problem-solving strategies and can generalize to low-resource settings, suggesting broad applicability for sequential ICL in complex reasoning tasks.

Abstract

Recent developments in large pre-trained language models have enabled unprecedented performance on a variety of downstream tasks. Achieving best performance with these models often leverages in-context learning, where a model performs a (possibly new) task given one or more examples. However, recent work has shown that the choice of examples can have a large impact on task performance and that finding an optimal set of examples is non-trivial. While there are many existing methods for selecting in-context examples, they generally score examples independently, ignoring the dependency between them and the order in which they are provided to the model. In this work, we propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning. We frame the problem of sequential example selection as a Markov decision process and train an example retriever using reinforcement learning. We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches heuristic and learnable baselines. We also use case studies to show that RetICL implicitly learns representations of problem solving strategies.

RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning

TL;DR

RetICL addresses the sensitivity of in-context learning to example selection by modeling sequential retrieval of in-context examples as a Markov decision process and training a retriever with reinforcement learning. The approach uses an LSTM-based latent state and a bilinear policy over a S-BERT embedding space, guided by a novel reward that blends task performance with model confidence. Empirical results across TabMWP, GSM8K, and QASC show RetICL consistently matches or surpasses heuristic and learnable baselines, with ablations confirming the value of the confidence reward, temporal conditioning, and exploration. Qualitative analyses demonstrate that RetICL learns representations of problem-solving strategies and can generalize to low-resource settings, suggesting broad applicability for sequential ICL in complex reasoning tasks.

Abstract

Recent developments in large pre-trained language models have enabled unprecedented performance on a variety of downstream tasks. Achieving best performance with these models often leverages in-context learning, where a model performs a (possibly new) task given one or more examples. However, recent work has shown that the choice of examples can have a large impact on task performance and that finding an optimal set of examples is non-trivial. While there are many existing methods for selecting in-context examples, they generally score examples independently, ignoring the dependency between them and the order in which they are provided to the model. In this work, we propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning. We frame the problem of sequential example selection as a Markov decision process and train an example retriever using reinforcement learning. We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches heuristic and learnable baselines. We also use case studies to show that RetICL implicitly learns representations of problem solving strategies.
Paper Structure (24 sections, 4 equations, 4 figures, 6 tables)

This paper contains 24 sections, 4 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: RetICL overview for question answering. Each latent state is constructed from the problem and previously selected examples. The next example is selected using a bilinear transformation between the latent state and examples in the corpus. After all examples are selected, we query the LLM, obtain a reward, and update the policy.
  • Figure 2: Change in relative accuracy as number of available example candidates increases.
  • Figure 3: Accuracy as the number of in-context examples in the prompt increases, evaluated on GSM8K using prompting via Random, kNN, and RetICL.
  • Figure 4: Example embedding visualizations. From left to right, top to bottom, GSM8K pre-trained embeddings, GSM8K RetICL embeddings, TabMWP pre-trained embeddings, and TabMWP RetICL embeddings. Points are colored based on the number of steps in an example's solution, with red being the least, green being the most, and yellow being in the middle.