"In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval"
Andrew Parry, Debasis Ganguly, Manish Chandra
TL;DR
This paper reframes In-Context Learning (ICL) through the lens of Information Retrieval (IR) and proposes three IR-inspired directions to enhance downstream ICL: adaptive selection of the number of demonstrations via query performance prediction (QPP) and learned κ(x), learning-to-rank approaches to order exemplars by downstream usefulness, and diversification/faceted IR to ensure informative, diverse prompts. It formalizes ICL with P(y|x) = f(x, P_{k}(x); φ_{LLM}) and explores how IR problems like QPP, ranking, and diversity map to ICL components, plus a preliminary evaluation showing that supervised adaptive ICL (SAICL) can outperform static ICL while unsupervised QPP-based methods may underperform. The work highlights a concrete agenda for cross-disciplinary methods, suggesting that efficient, task-specific selection and combination of ICL exemplars can meaningfully improve real-world NLP tasks. It also provides a practical evaluation setup using GPT-J-6B across standard text classification benchmarks, demonstrating reduced context length and improved accuracy when employing data-driven context selection.
Abstract
With the increasing ability of large language models (LLMs), in-context learning (ICL) has evolved as a new paradigm for natural language processing (NLP), where instead of fine-tuning the parameters of an LLM specific to a downstream task with labeled examples, a small number of such examples is appended to a prompt instruction for controlling the decoder's generation process. ICL, thus, is conceptually similar to a non-parametric approach, such as $k$-NN, where the prediction for each instance essentially depends on the local topology, i.e., on a localised set of similar instances and their labels (called few-shot examples). This suggests that a test instance in ICL is analogous to a query in IR, and similar examples in ICL retrieved from a training set relate to a set of documents retrieved from a collection in IR. While standard unsupervised ranking models can be used to retrieve these few-shot examples from a training set, the effectiveness of the examples can potentially be improved by re-defining the notion of relevance specific to its utility for the downstream task, i.e., considering an example to be relevant if including it in the prompt instruction leads to a correct prediction. With this task-specific notion of relevance, it is possible to train a supervised ranking model (e.g., a bi-encoder or cross-encoder), which potentially learns to optimally select the few-shot examples. We believe that the recent advances in neural rankers can potentially find a use case for this task of optimally choosing examples for more effective downstream ICL predictions.
