Table of Contents
Fetching ...

Fair Document Valuation in LLM Summaries via Shapley Values

Zikun Ye, Hema Yoganarasimhan

TL;DR

This work formalizes a Shapley-value framework to fairly credit individual source documents used in LLM-generated summaries, addressing attribution and revenue-sharing challenges. To scale to real platforms, it introduces Cluster Shapley, a structure-aware approximation that clusters semantically similar documents via embeddings and reasoned with a tunable diameter ε, with theoretical bounds showing error shrinking as ε → 0. Empirically, on Amazon product reviews, Cluster Shapley outperforms standard Shapley approximations and simple attribution rules, offering a favorable efficiency-accuracy trade-off and robust performance across LLMs and evaluation setups. The findings highlight the value of leveraging embedding-based structure in attribution and provide a scalable pathway for fair content monetization in AI-powered search and summarization systems.

Abstract

Large Language Models (LLMs) are increasingly used in systems that retrieve and summarize content from multiple sources, such as search engines and AI assistants. While these systems enhance user experience through coherent summaries, they obscure the individual contributions of original content creators, raising concerns about credit attribution and compensation. We address the challenge of valuing individual documents used in LLM-generated summaries by proposing a Shapley value-based framework for fair document valuation. Although theoretically appealing, exact Shapley value computation is prohibitively expensive at scale. To improve efficiency, we develop Cluster Shapley, a simple approximation algorithm that leverages semantic similarity among documents to reduce computation while maintaining attribution accuracy. Using Amazon product review data, we empirically show that off-the-shelf Shapley approximations, such as Monte Carlo sampling and Kernel SHAP, perform suboptimally in LLM settings, whereas Cluster Shapley substantially improves the efficiency-accuracy frontier. Moreover, simple attribution rules (e.g., equal or relevance-based allocation), though computationally cheap, lead to highly unfair outcomes. Together, our findings highlight the potential of structure-aware Shapley approximations tailored to LLM summarization and offer guidance for platforms seeking scalable and fair content attribution mechanisms.

Fair Document Valuation in LLM Summaries via Shapley Values

TL;DR

This work formalizes a Shapley-value framework to fairly credit individual source documents used in LLM-generated summaries, addressing attribution and revenue-sharing challenges. To scale to real platforms, it introduces Cluster Shapley, a structure-aware approximation that clusters semantically similar documents via embeddings and reasoned with a tunable diameter ε, with theoretical bounds showing error shrinking as ε → 0. Empirically, on Amazon product reviews, Cluster Shapley outperforms standard Shapley approximations and simple attribution rules, offering a favorable efficiency-accuracy trade-off and robust performance across LLMs and evaluation setups. The findings highlight the value of leveraging embedding-based structure in attribution and provide a scalable pathway for fair content monetization in AI-powered search and summarization systems.

Abstract

Large Language Models (LLMs) are increasingly used in systems that retrieve and summarize content from multiple sources, such as search engines and AI assistants. While these systems enhance user experience through coherent summaries, they obscure the individual contributions of original content creators, raising concerns about credit attribution and compensation. We address the challenge of valuing individual documents used in LLM-generated summaries by proposing a Shapley value-based framework for fair document valuation. Although theoretically appealing, exact Shapley value computation is prohibitively expensive at scale. To improve efficiency, we develop Cluster Shapley, a simple approximation algorithm that leverages semantic similarity among documents to reduce computation while maintaining attribution accuracy. Using Amazon product review data, we empirically show that off-the-shelf Shapley approximations, such as Monte Carlo sampling and Kernel SHAP, perform suboptimally in LLM settings, whereas Cluster Shapley substantially improves the efficiency-accuracy frontier. Moreover, simple attribution rules (e.g., equal or relevance-based allocation), though computationally cheap, lead to highly unfair outcomes. Together, our findings highlight the potential of structure-aware Shapley approximations tailored to LLM summarization and offer guidance for platforms seeking scalable and fair content attribution mechanisms.

Paper Structure

This paper contains 46 sections, 4 theorems, 39 equations, 22 figures, 3 tables, 3 algorithms.

Key Result

Theorem 1

Under Assumption assump:lipschitz, the approximated Shapley values $\hat{\phi}$, output by the Cluster Shapley algorithm, converge to the exact Shapley values $\phi$ as the clustering diameter $\epsilon$ approaches zero (i.e., as each cluster becomes an identical-document singleton). In particular,

Figures (22)

  • Figure 1: Amazon's AI-generated customer review for a https://www.amazon.com/DualShock-Wireless-Controller-PlayStation-Black-4/dp/B01LWVX2RG (snapshot taken on Dec 12, 2024): The left image shows the wireless controller product page on Amazon. The center image displays an Amazon AI-generated summary review of this product. Users can click "Select to learn more" to focus on specific aspects of interest. The right image shows AI-generated summaries for the selected aspect, displaying the source customer reviews with key information highlighted in bold.
  • Figure 2: ChatGPT-4o with RAG-Enhanced Web Search
  • Figure 3: Architecture of our LLM-based search and summarization tool for Amazon Product Reviews. This flowchart illustrates the architecture of an AI-powered search engine designed for processing and summarizing reviews about the quality of DualShock 4 Wireless Controller. The process starts with the user query, where a specific question about the quality is posed. In the retrieval phase, the query’s key semantic information, "the quality of the wireless controller", is embedded and compared to filtered Amazon product reviews using cosine similarity. The system then retrieves the top eight most relevant reviews. During the augmentation phase, these retrieved reviews are combined with the original user query and our designed prompt, guiding the generation process. Finally, the generation phase employs OpenAI's GPT-4o model to summarize the augmented information, providing a concise response that cites the specific product reviews to ensure traceability and relevance to the user's query.
  • Figure 4: Clustering result of Top 8 relevant Amazon reviews for the query "How is the quality of the wireless controller?" We use 3072-dimensional OpenAI embeddings for the clustering. However, we use PCA to reduce the embedding dimension to 2 for better visualization. Dots represent the reviews, and squares represent clusters. ${\phi}_i$ is the exact Shapley value while $\hat{\phi}_i$ is the approximated Shapley value by the Cluster Shapley algorithm.
  • Figure 5: Efficient frontiers of Shapley approximation algorithms. The $x$-axis represents the number of unique subsets used by the algorithms, averaged across all test queries and reviews. The $y$-axis represents the Mean Absolute Error (MAE) of the Shapley values, averaged across all test queries and reviews. The points on the Cluster Shapley curve correspond to different clustering diameters $\epsilon$. For reference on the size of MAE, the average Shapley value over all test samples is 1.084, indicating that 0.2 MAE is around a 20% percentage error. 95% CIs for Monte Carlo, Truncated Monte Carlo, and Kernel SHAP are computed through 10 replications of the algorithms.
  • ...and 17 more figures

Theorems & Definitions (8)

  • Theorem 1: Convergence and Approximation Error Bound
  • Corollary 1: Accuracy in homogeneous clusters
  • Theorem 2: General Cluster Shapley Approximation
  • Corollary 2: Monte Carlo Cluster Shapley Error and Complexity
  • proof
  • proof
  • proof
  • proof