One size doesn't fit all: Predicting the Number of Examples for In-Context Learning

Manish Chandra; Debasis Ganguly; Iadh Ounis

One size doesn't fit all: Predicting the Number of Examples for In-Context Learning

Manish Chandra, Debasis Ganguly, Iadh Ounis

TL;DR

This work tackles the limitation of fixed, one-size-fits-all context sizes in in-context learning by introducing Adaptive In-Context Learning (AICL), which predicts the optimal number of demonstrations $k$ for each test instance. AICL constructs ground-truth targets by evaluating $k$-shot performance for $k \in \{0,\ldots,M\}$, encoding this as a Boolean vector $\mathcal{K}(\boldsymbol{x})$ and training a multi-label classifier to map instance features to this vector; inference selects $\kappa(\boldsymbol{x}) = \arg\max \theta(\boldsymbol{x})$ for the predicted $k$. The method optionally augments input features with the distribution of neighbor labels to further improve the predictor, and experiments across SST2, TREC, CoLA, and RTE with Llama-2 and Phi-2 models show that AICL, especially with neighbor-label information (AICL(E+N)), outperforms fixed-k baselines by up to a substantial margin and generalizes across datasets and model families. The results suggest that per-instance tailoring of context size can substantially enhance LLM-based classification tasks, reducing the need for expensive hyper-parameter tuning and enabling more robust few-shot inference. Overall, AICL provides a practical, data-driven framework to optimize prompt context in diverse NLP tasks and model configurations.

Abstract

In-context learning (ICL) refers to the process of adding a small number of localized examples from a training set of labelled data to an LLM's prompt with an objective to effectively control the generative process seeking to improve the downstream task performance. Existing ICL approaches use an identical number of examples (a pre-configured hyper-parameter) for each data instance. Our work alleviates the limitations of this 'one fits all' approach by dynamically predicting the number of examples for each data instance to be used in few-shot inference with LLMs. In particular, we employ a multi-label classifier, the parameters of which are fitted using a training set, where the label for each instance in this training set indicates if using a specific value of k (number of most similar examples from 0 up to a maximum value) leads to correct k-shot downstream predictions. Our experiments on a number of text classification benchmarks show that AICL substantially outperforms standard ICL by up to 17%.

One size doesn't fit all: Predicting the Number of Examples for In-Context Learning

TL;DR

for each test instance. AICL constructs ground-truth targets by evaluating

-shot performance for

, encoding this as a Boolean vector

and training a multi-label classifier to map instance features to this vector; inference selects

for the predicted

. The method optionally augments input features with the distribution of neighbor labels to further improve the predictor, and experiments across SST2, TREC, CoLA, and RTE with Llama-2 and Phi-2 models show that AICL, especially with neighbor-label information (AICL(E+N)), outperforms fixed-k baselines by up to a substantial margin and generalizes across datasets and model families. The results suggest that per-instance tailoring of context size can substantially enhance LLM-based classification tasks, reducing the need for expensive hyper-parameter tuning and enabling more robust few-shot inference. Overall, AICL provides a practical, data-driven framework to optimize prompt context in diverse NLP tasks and model configurations.

Abstract

Paper Structure (24 sections, 6 equations, 4 figures, 3 tables)

This paper contains 24 sections, 6 equations, 4 figures, 3 tables.

Introduction
Related Work
Prompt tuning and searching.
In-context Learning (ICL).
Proposed Methodology
Standard In-Context Learning (ICL)
Adaptive ICL
Obtaining the ground-truth values of the number of ICL examples for each training set instance.
Training a multi-label classifier.
Distribution of the downstream-task class labels as additional features.
Evaluation
Research Questions and Datasets
Methods Investigated
Baselines.
Variants of AICL.
...and 9 more sections

Figures (4)

Figure 1: a)Example workflow of ICL for sentiment classification: The example shows a test instance for which a single demonstration (as retrieved from the training set) does not result in a correct prediction (prediction workflow of the red arrows). It also shows that increasing the number of demonstrations from one to two results in a correct prediction (green arrows). We propose a method to estimate the number of examples that are likely to yield a correct prediction. b)Motivation behind using a variable number of examples for ICL across the test instances: The test sample '?' is located within a homogeneous neighborhood of negative data points indicating that an LLM may perform well with only a few nearest neighbour demonstrations. The test instance '?', on the other hand, is located within a heterogeneous neighborhood, as a result of which, an LLM may require a higher number of such demonstrations to correctly predict its class.
Figure 2: Schematic diagram of Adaptive In-Context Learning (AICL) workflow.
Figure 3: FICL (macro-averaged) F-scores for different context sizes on the test splits of the different datasets (AICL results also included for comparison). These plots demonstrate that AICL can be applied on any dataset without requiring to optimize any hyper-parameter (e.g., $k$ in FICL).
Figure 4: Macro-averaged F1 scores with different proportions of the training set used to train $\kappa(\mathbf{x})$ - the multi-label classifier of AICL (Equation \ref{['eq:mlc-l']}).

One size doesn't fit all: Predicting the Number of Examples for In-Context Learning

TL;DR

Abstract

One size doesn't fit all: Predicting the Number of Examples for In-Context Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (4)