RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning
Alexander Scarlatos, Andrew Lan
TL;DR
RetICL addresses the sensitivity of in-context learning to example selection by modeling sequential retrieval of in-context examples as a Markov decision process and training a retriever with reinforcement learning. The approach uses an LSTM-based latent state and a bilinear policy over a S-BERT embedding space, guided by a novel reward that blends task performance with model confidence. Empirical results across TabMWP, GSM8K, and QASC show RetICL consistently matches or surpasses heuristic and learnable baselines, with ablations confirming the value of the confidence reward, temporal conditioning, and exploration. Qualitative analyses demonstrate that RetICL learns representations of problem-solving strategies and can generalize to low-resource settings, suggesting broad applicability for sequential ICL in complex reasoning tasks.
Abstract
Recent developments in large pre-trained language models have enabled unprecedented performance on a variety of downstream tasks. Achieving best performance with these models often leverages in-context learning, where a model performs a (possibly new) task given one or more examples. However, recent work has shown that the choice of examples can have a large impact on task performance and that finding an optimal set of examples is non-trivial. While there are many existing methods for selecting in-context examples, they generally score examples independently, ignoring the dependency between them and the order in which they are provided to the model. In this work, we propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning. We frame the problem of sequential example selection as a Markov decision process and train an example retriever using reinforcement learning. We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches heuristic and learnable baselines. We also use case studies to show that RetICL implicitly learns representations of problem solving strategies.
