Prompt Valuation Based on Shapley Values
Hanxi Liu, Xiaokai Mao, Haocheng Xia, Jian Lou, Jinfei Liu, Kui Ren
TL;DR
The paper tackles fair, interaction-aware evaluation of prompts in multi-prompt learning by applying Shapley values to quantify each prompt's contribution to task performance. It introduces a two-stage, learning-based approach to estimate Shapley values from prompt embeddings, enabling real-time valuation, and proves a Lipschitz-bound that links prompt similarity to Shapley-value similarity. The method is validated on SST2, AQuA, and Date with BERT and GPT-3.5-turbo, showing that a compact set of high-value prompts can achieve competitive performance and that Shapley-based ranking reliably identifies valuable prompts. This work has practical implications for prompt design and data marketplaces by offering a principled, scalable mechanism to price and select prompts for ensembles.
Abstract
Large language models (LLMs) excel on new tasks without additional training, simply by providing natural language prompts that demonstrate how the task should be performed. Prompt ensemble methods comprehensively harness the knowledge of LLMs while mitigating individual biases and errors and further enhancing performance. However, more prompts do not necessarily lead to better results, and not all prompts are beneficial. A small number of high-quality prompts often outperform many low-quality prompts. Currently, there is a lack of a suitable method for evaluating the impact of prompts on the results. In this paper, we utilize the Shapley value to fairly quantify the contributions of prompts, helping to identify beneficial or detrimental prompts, and potentially guiding prompt valuation in data markets. Through extensive experiments employing various ensemble methods and utility functions on diverse tasks, we validate the effectiveness of using the Shapley value method for prompts as it effectively distinguishes and quantifies the contributions of each prompt.
