The Shapley Value in Database Management
Leopoldo Bertossi, Benny Kimelfeld, Ester Livshits, Mikaël Monet
TL;DR
This paper surveys the use of the Shapley value as a principled attribution mechanism for database tasks, focusing on contributions of individual data items to query results and to database inconsistencies. It examines exact and approximate computation paths, including reductions to probabilistic query evaluation and knowledge compilation, and presents dichotomy-type complexity results for various query classes and integrity constraints. The work also analyzes inconsistency measures, their Shapley-based decompositions, and how these insights can guide data cleaning, with practical considerations for tooling and extensions. Overall, the article clarifies both the theoretical limits and practical strategies for deploying Shapley-valued explanations in databases, highlighting open problems and potential avenues for integration in database systems.
Abstract
Attribution scores can be applied in data management to quantify the contribution of individual items to conclusions from the data, as part of the explanation of what led to these conclusions. In Artificial Intelligence, Machine Learning, and Data Management, some of the common scores are deployments of the Shapley value, a formula for profit sharing in cooperative game theory. Since its invention in the 1950s, the Shapley value has been used for contribution measurement in many fields, from economics to law, with its latest researched applications in modern machine learning. Recent studies investigated the application of the Shapley value to database management. This article gives an overview of recent results on the computational complexity of the Shapley value for measuring the contribution of tuples to query answers and to the extent of inconsistency with respect to integrity constraints. More specifically, the article highlights lower and upper bounds on the complexity of calculating the Shapley value, either exactly or approximately, as well as solutions for realizing the calculation in practice.
