Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection

Costas Mavromatis; Balasubramaniam Srinivasan; Zhengyuan Shen; Jiani Zhang; Huzefa Rangwala; Christos Faloutsos; George Karypis

Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection

Costas Mavromatis, Balasubramaniam Srinivasan, Zhengyuan Shen, Jiani Zhang, Huzefa Rangwala, Christos Faloutsos, George Karypis

TL;DR

The paper tackles budgeted in-context learning (ICL) by proposing AdaIcl, a model-adaptive, optimization-free method that fuses uncertainty-based hard-example detection with semantic-diversity coverage through a MaxCover formulation. It introduces AdaIcl-base (k-means on hard examples), AdaIcl (MaxCover over semantic regions), and AdaIcl+ (dynamic re-weighted MaxCover) to select informative demonstrations under a fixed annotation budget. Across nine NLP tasks and seven LLMs, AdaIcl achieves up to 7.5 percentage points improvements over random baselines and up to 3× budget savings while needing fewer ICL demonstrations, and it also improves calibration. The approach is shown to be robust to different similarity encoders and LLMs, and the authors provide public code to support reproducibility and deployment in practical settings.

Abstract

Large Language Models (LLMs) can adapt to new tasks via in-context learning (ICL). ICL is efficient as it does not require any parameter updates to the trained LLM, but only few annotated examples as input for the LLM. In this work, we investigate an active learning approach for ICL, where there is a limited budget for annotating examples. We propose a model-adaptive optimization-free algorithm, termed AdaICL, which identifies examples that the model is uncertain about, and performs semantic diversity-based example selection. Diversity-based sampling improves overall effectiveness, while uncertainty sampling improves budget efficiency and helps the LLM learn new information. Moreover, AdaICL poses its sampling strategy as a Maximum Coverage problem, that dynamically adapts based on the model's feedback and can be approximately solved via greedy algorithms. Extensive experiments on nine datasets and seven LLMs show that AdaICL improves performance by 4.4% accuracy points over SOTA (7.7% relative improvement), is up to 3x more budget-efficient than performing annotations uniformly at random, while it outperforms SOTA with 2x fewer ICL examples.

Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection

TL;DR

Abstract

Paper Structure (33 sections, 7 equations, 12 figures, 13 tables, 4 algorithms)

This paper contains 33 sections, 7 equations, 12 figures, 13 tables, 4 algorithms.

Introduction
Related Work
Problem Statement & Motivation
Adaptive Example Annotation for ICL
AdaIcl-base: A $k$means Approach
AdaIcl: Selection by Maximum Coverage
AdaIcl+: Dynamically Re-Weighted MaxCover
Experimental Setting
Results & Analysis
RQ1: AdaIcl is Effective
RQ2 & RQ3: AdaIcl is Efficient and Robust
RQ3: AdaIcl improves Calibration
Conclusions
Reproducibility Statement
AdaIcl Details
...and 18 more sections

Figures (12)

Figure 1: AdaIcl effectively combines diversity and uncertainty sampling, outperforming other strategies in the low-resource scenario, averaged over seven datasets. Here, the budget is 20 annotations for retrieval-based 5-shot ICL.
Figure 2: Our studied problem setting. Given an unlabeled set ${\mathcal{U}}$ and a fixed budget $B$, the goal is to select the $B$ most informative examples for annotation (set ${\mathcal{L}}$), which are used to maximize ICL performance with an LLM $M$. During ICL inference, a $k$-NN retriever based on a similarity space ${\mathcal{S}}$ determines the $k$-shot demonstrations for each test instance.
Figure 3: AdaIcl algorithm. AdaIcl uses $k$-shot ICL to determine which examples the model $M$ is uncertain for (hard examples). Then, it performs diversity-based uncertainty sampling over ${\mathcal{S}}$ by optimizing the MaxCover problem in Equation \ref{['eq:mcp-1']} via Algorithm \ref{['alg:greedy']} to identify the examples that help the model learn new information. The process is repeated until the budget $B$ is exhausted, and when done, it returns the annotated set ${\mathcal{L}}$.
Figure 4: Performance comparison across different tasks with GPT-J (6B) and GPT-Neo (1.3B). "Best Base." denotes the best baseline for the task. AdaIcl performs the best, while for the classification tasks AdaIcl-base is the second-best (full results in Appendix \ref{['app:full_res']}).
Figure 5: Multi-step results with GPT-Neo. Sweet point: the point at which we exceed the best performance achieved by random selection.
...and 7 more figures

Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection

TL;DR

Abstract

Which Examples to Annotate for In-Context Learning? Towards Effective and Efficient Selection

Authors

TL;DR

Abstract

Table of Contents

Figures (12)