Prompt Valuation Based on Shapley Values

Hanxi Liu; Xiaokai Mao; Haocheng Xia; Jian Lou; Jinfei Liu; Kui Ren

Prompt Valuation Based on Shapley Values

Hanxi Liu, Xiaokai Mao, Haocheng Xia, Jian Lou, Jinfei Liu, Kui Ren

TL;DR

The paper tackles fair, interaction-aware evaluation of prompts in multi-prompt learning by applying Shapley values to quantify each prompt's contribution to task performance. It introduces a two-stage, learning-based approach to estimate Shapley values from prompt embeddings, enabling real-time valuation, and proves a Lipschitz-bound that links prompt similarity to Shapley-value similarity. The method is validated on SST2, AQuA, and Date with BERT and GPT-3.5-turbo, showing that a compact set of high-value prompts can achieve competitive performance and that Shapley-based ranking reliably identifies valuable prompts. This work has practical implications for prompt design and data marketplaces by offering a principled, scalable mechanism to price and select prompts for ensembles.

Abstract

Large language models (LLMs) excel on new tasks without additional training, simply by providing natural language prompts that demonstrate how the task should be performed. Prompt ensemble methods comprehensively harness the knowledge of LLMs while mitigating individual biases and errors and further enhancing performance. However, more prompts do not necessarily lead to better results, and not all prompts are beneficial. A small number of high-quality prompts often outperform many low-quality prompts. Currently, there is a lack of a suitable method for evaluating the impact of prompts on the results. In this paper, we utilize the Shapley value to fairly quantify the contributions of prompts, helping to identify beneficial or detrimental prompts, and potentially guiding prompt valuation in data markets. Through extensive experiments employing various ensemble methods and utility functions on diverse tasks, we validate the effectiveness of using the Shapley value method for prompts as it effectively distinguishes and quantifies the contributions of each prompt.

Prompt Valuation Based on Shapley Values

TL;DR

Abstract

Paper Structure (32 sections, 2 theorems, 28 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 32 sections, 2 theorems, 28 equations, 7 figures, 2 tables, 1 algorithm.

Introduction
Motivation
Contributions
Related works
Shapley values for prompts
Preliminaries
Multi-prompt learning
Shapley values in multi-prompt learning
The Shapley value
Learning Shapley values of prompts
Methodology
Lipschitz continuity
Similar prompts receive similar values
Lipschitz Continuity of the utility function
Experiments
...and 17 more sections

Key Result

Theorem 1

Let $\mathcal{U}$ be a utility function. if $\mathcal{U}$ is Lipschitz continuous with respect to some norm $||\cdot||$ on the input space with a Lipschitz constant $L$, then for any two inputs $\bm{e}_1$ and $\bm{e}_2$ corresponding to similar prompts, the absolute difference in their Shapley valu where $S$ is coalition of embeddings except $\bm{e}_i$ and $\bm{e}_j$.

Figures (7)

Figure 1: Examples of prompt ensembling and prompt augmentation.
Figure 2: Examples of coalitions.
Figure 3: Results of SST2 with BERT-base, as well as AQuA and Date with GPT-3.5-turbo and Manual-CoT. We add the currently most valuable prompt to the combination iteratively. For comparison, we also calculate the leave-one-out (LOO) value and combine prompts in the same manner.
Figure 4: Sort prompt based on Shapley values obtained by the three methods, add prompts, and calculate accuracy separately on SST2.
Figure 5: Results for AQuA using Manual prompt (Manual-CoT but without rationale) and Auto-CoT.
...and 2 more figures

Theorems & Definitions (6)

Definition 1: Lipschitz Continuity
Definition 2: Beta Distribution
Theorem 1
Lemma 1
proof
proof

Prompt Valuation Based on Shapley Values

TL;DR

Abstract

Prompt Valuation Based on Shapley Values

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (6)