DemoShapley: Valuation of Demonstrations for In-Context Learning
Shan Xie, Man Luo, Chadly Daniel Stern, Mengnan Du, Lu Cheng
TL;DR
This work tackles the instability of in-context learning (ICL) caused by demonstration selection and order. It introduces DemoShapley and Beta-DemoShapley, two Shapley-value-based methods that quantify each demonstration's marginal contribution by averaging effects over multiple prompt permutations, with Beta weighting to emphasize small prompts. Across multiple LLMs and tasks, these methods improve predictive performance, enhance out-of-distribution generalization, detect mislabeled data, and reduce bias, with Beta-DemoShapley particularly benefiting low-shot settings. Importantly, the approach operates at inference time without gradient access or fine-tuning, providing a principled, fair framework for robust demonstration valuation in practical ICL deployment.
Abstract
Large language models (LLMs) using in-context learning (ICL) excel in many tasks without task-specific fine-tuning. However, demonstration selection and ordering greatly impact ICL effectiveness. Focus on this issue, we propose DemoShapley, a Shapley-value based method that evaluates each demonstration's contribution by measuring its marginal effect across different prompt permutations. To further account for ICL's limited context windows and frequent low-shot settings, we introduce Beta-DemoShapley, a weighted extension that emphasizes the influence of smaller prompt sizes. Experiments on multiple benchmarks show that DemoShapley consistently outperforms existing influence-based selection strategies, while Beta-DemoShapley further improves performance in low-shot scenarios. Both methods also detect mislabeled data, enhance generalization to out-of-distribution tasks, and reduce demographic bias. Together, they provide a unified and robust framework for demonstration valuation in ICL.
