Meta-Sel: Efficient Demonstration Selection for In-Context Learning via Supervised Meta-Learning
Xubin Wang, Weijia Jia
TL;DR
The paper Addresses the bottleneck of selecting demonstrations for in-context learning under prompt budgets. It introduces Meta-Sel, a lightweight supervised meta-learning framework that uses two inexpensive meta-features to score (query,candidate) pairs and rank demonstrations via a calibrated logistic regressor in a single offline–online pass, avoiding LLM calls at inference. Meta-Sel is evaluated across four intent datasets and five open-source LLMs, showing top-tier or near-top performance with notable gains for smaller models and with deterministic, auditable rankings. The work also provides a broad empirical benchmark of 12 baseline methods, clarifying where simple similarity signals suffice and where learned weighting yields benefits, thereby offering practical guidance for efficient ICL deployment and future extensions in richer meta-features and generation tasks.
Abstract
Demonstration selection is a practical bottleneck in in-context learning (ICL): under a tight prompt budget, accuracy can change substantially depending on which few-shot examples are included, yet selection must remain cheap enough to run per query over large candidate pools. We propose Meta-Sel, a lightweight supervised meta-learning approach for intent classification that learns a fast, interpretable scoring function for (candidate, query) pairs from labeled training data. Meta-Sel constructs a meta-dataset by sampling pairs from the training split and using class agreement as supervision, then trains a calibrated logistic regressor on two inexpensive meta-features: TF--IDF cosine similarity and a length-compatibility ratio. At inference time, the selector performs a single vectorized scoring pass over the full candidate pool and returns the top-k demonstrations, requiring no model fine-tuning, no online exploration, and no additional LLM calls. This yields deterministic rankings and makes the selection mechanism straightforward to audit via interpretable feature weights. Beyond proposing Meta-Sel, we provide a broad empirical study of demonstration selection, benchmarking 12 methods -- spanning prompt engineering baselines, heuristic selection, reinforcement learning, and influence-based approaches -- across four intent datasets and five open-source LLMs. Across this benchmark, Meta-Sel consistently ranks among the top-performing methods, is particularly effective for smaller models where selection quality can partially compensate for limited model capacity, and maintains competitive selection-time overhead.
