Table of Contents
Fetching ...

One size doesn't fit all: Predicting the Number of Examples for In-Context Learning

Manish Chandra, Debasis Ganguly, Iadh Ounis

TL;DR

This work tackles the limitation of fixed, one-size-fits-all context sizes in in-context learning by introducing Adaptive In-Context Learning (AICL), which predicts the optimal number of demonstrations $k$ for each test instance. AICL constructs ground-truth targets by evaluating $k$-shot performance for $k \in \{0,\ldots,M\}$, encoding this as a Boolean vector $\mathcal{K}(\boldsymbol{x})$ and training a multi-label classifier to map instance features to this vector; inference selects $\kappa(\boldsymbol{x}) = \arg\max \theta(\boldsymbol{x})$ for the predicted $k$. The method optionally augments input features with the distribution of neighbor labels to further improve the predictor, and experiments across SST2, TREC, CoLA, and RTE with Llama-2 and Phi-2 models show that AICL, especially with neighbor-label information (AICL(E+N)), outperforms fixed-k baselines by up to a substantial margin and generalizes across datasets and model families. The results suggest that per-instance tailoring of context size can substantially enhance LLM-based classification tasks, reducing the need for expensive hyper-parameter tuning and enabling more robust few-shot inference. Overall, AICL provides a practical, data-driven framework to optimize prompt context in diverse NLP tasks and model configurations.

Abstract

In-context learning (ICL) refers to the process of adding a small number of localized examples from a training set of labelled data to an LLM's prompt with an objective to effectively control the generative process seeking to improve the downstream task performance. Existing ICL approaches use an identical number of examples (a pre-configured hyper-parameter) for each data instance. Our work alleviates the limitations of this 'one fits all' approach by dynamically predicting the number of examples for each data instance to be used in few-shot inference with LLMs. In particular, we employ a multi-label classifier, the parameters of which are fitted using a training set, where the label for each instance in this training set indicates if using a specific value of k (number of most similar examples from 0 up to a maximum value) leads to correct k-shot downstream predictions. Our experiments on a number of text classification benchmarks show that AICL substantially outperforms standard ICL by up to 17%.

One size doesn't fit all: Predicting the Number of Examples for In-Context Learning

TL;DR

This work tackles the limitation of fixed, one-size-fits-all context sizes in in-context learning by introducing Adaptive In-Context Learning (AICL), which predicts the optimal number of demonstrations for each test instance. AICL constructs ground-truth targets by evaluating -shot performance for , encoding this as a Boolean vector and training a multi-label classifier to map instance features to this vector; inference selects for the predicted . The method optionally augments input features with the distribution of neighbor labels to further improve the predictor, and experiments across SST2, TREC, CoLA, and RTE with Llama-2 and Phi-2 models show that AICL, especially with neighbor-label information (AICL(E+N)), outperforms fixed-k baselines by up to a substantial margin and generalizes across datasets and model families. The results suggest that per-instance tailoring of context size can substantially enhance LLM-based classification tasks, reducing the need for expensive hyper-parameter tuning and enabling more robust few-shot inference. Overall, AICL provides a practical, data-driven framework to optimize prompt context in diverse NLP tasks and model configurations.

Abstract

In-context learning (ICL) refers to the process of adding a small number of localized examples from a training set of labelled data to an LLM's prompt with an objective to effectively control the generative process seeking to improve the downstream task performance. Existing ICL approaches use an identical number of examples (a pre-configured hyper-parameter) for each data instance. Our work alleviates the limitations of this 'one fits all' approach by dynamically predicting the number of examples for each data instance to be used in few-shot inference with LLMs. In particular, we employ a multi-label classifier, the parameters of which are fitted using a training set, where the label for each instance in this training set indicates if using a specific value of k (number of most similar examples from 0 up to a maximum value) leads to correct k-shot downstream predictions. Our experiments on a number of text classification benchmarks show that AICL substantially outperforms standard ICL by up to 17%.
Paper Structure (24 sections, 6 equations, 4 figures, 3 tables)

This paper contains 24 sections, 6 equations, 4 figures, 3 tables.

Figures (4)

  • Figure 1: a)Example workflow of ICL for sentiment classification: The example shows a test instance for which a single demonstration (as retrieved from the training set) does not result in a correct prediction (prediction workflow of the red arrows). It also shows that increasing the number of demonstrations from one to two results in a correct prediction (green arrows). We propose a method to estimate the number of examples that are likely to yield a correct prediction. b)Motivation behind using a variable number of examples for ICL across the test instances: The test sample '?' is located within a homogeneous neighborhood of negative data points indicating that an LLM may perform well with only a few nearest neighbour demonstrations. The test instance '?', on the other hand, is located within a heterogeneous neighborhood, as a result of which, an LLM may require a higher number of such demonstrations to correctly predict its class.
  • Figure 2: Schematic diagram of Adaptive In-Context Learning (AICL) workflow.
  • Figure 3: FICL (macro-averaged) F-scores for different context sizes on the test splits of the different datasets (AICL results also included for comparison). These plots demonstrate that AICL can be applied on any dataset without requiring to optimize any hyper-parameter (e.g., $k$ in FICL).
  • Figure 4: Macro-averaged F1 scores with different proportions of the training set used to train $\kappa(\mathbf{x})$ - the multi-label classifier of AICL (Equation \ref{['eq:mlc-l']}).