Table of Contents
Fetching ...

When is Shapley Value Computation a Matter of Counting?

Meghyn Bienvenu, Diego Figueira, Pierre Lafourcade

TL;DR

This work links Shapley value computation in databases to counting problems via Fixed-size Generalized Model Counting, establishing polynomial-time reductions that show Shapley is fundamentally a counting problem. It proves three key reductions (lemmas on leaks, pseudo-connectedness, and decomposability) that yield FP/$ ext{shP}$-hard dichotomies for broad query classes, including constant-free connected UCQs and connected graph queries, by transferring known dichotomies from $ ext{PQE}$ and $ ext{GMC}$. The results extend to variants without exogenous facts and to queries with negation and to Shapley values of constants, offering a unified complexity framework and opening questions about purely endogenous settings and constant-based Shapley. Overall, the paper provides strong evidence that Shapley value computation in databases is governed by counting complexity and enables broad transference of tractability/hardness results across related formalisms.

Abstract

The Shapley value provides a natural means of quantifying the contributions of facts to database query answers. In this work, we seek to broaden our understanding of Shapley value computation (SVC) in the database setting by revealing how it relates to Fixed-size Generalized Model Counting (FGMC), which is the problem of computing the number of sub-databases of a given size and containing a given set of assumed facts that satisfy a fixed query. Our focus will be on explaining the difficulty of SVC via FGMC, and to this end, we identify general conditions on queries which enable reductions from FGMC to SVC. As a byproduct, we not only obtain alternative explanations for most existing results on SVC, but also new complexity results. In particular, we establish FP-#P complexity dichotomies for constant-free connected UCQs and homomorphism-closed connected graph queries. We further explore variants of SVC, either in the absence of assumed facts, or where we measure the contribution of constants rather than facts.

When is Shapley Value Computation a Matter of Counting?

TL;DR

This work links Shapley value computation in databases to counting problems via Fixed-size Generalized Model Counting, establishing polynomial-time reductions that show Shapley is fundamentally a counting problem. It proves three key reductions (lemmas on leaks, pseudo-connectedness, and decomposability) that yield FP/-hard dichotomies for broad query classes, including constant-free connected UCQs and connected graph queries, by transferring known dichotomies from and . The results extend to variants without exogenous facts and to queries with negation and to Shapley values of constants, offering a unified complexity framework and opening questions about purely endogenous settings and constant-based Shapley. Overall, the paper provides strong evidence that Shapley value computation in databases is governed by counting complexity and enables broad transference of tractability/hardness results across related formalisms.

Abstract

The Shapley value provides a natural means of quantifying the contributions of facts to database query answers. In this work, we seek to broaden our understanding of Shapley value computation (SVC) in the database setting by revealing how it relates to Fixed-size Generalized Model Counting (FGMC), which is the problem of computing the number of sub-databases of a given size and containing a given set of assumed facts that satisfy a fixed query. Our focus will be on explaining the difficulty of SVC via FGMC, and to this end, we identify general conditions on queries which enable reductions from FGMC to SVC. As a byproduct, we not only obtain alternative explanations for most existing results on SVC, but also new complexity results. In particular, we establish FP-#P complexity dichotomies for constant-free connected UCQs and homomorphism-closed connected graph queries. We further explore variants of SVC, either in the absence of assumed facts, or where we measure the contribution of constants rather than facts.
Paper Structure (27 sections, 22 theorems, 19 equations, 3 figures)

This paper contains 27 sections, 22 theorems, 19 equations, 3 figures.

Key Result

proposition 1

Let $q$ be a "Boolean" "UCQ". If $q$ is "safe", then both $\PQE q$ and $\GMC q$ are in "FP", otherwise both are "shP"-hard.

Figures (3)

  • Figure 1: (a) A (clickable) summary of the reductions. An arrow from $A$ to $B$ means a "polynomial-time Turing-reduction" from $A$ to $B$. Red arrows indicate our contributions. (b) A (clickable) summary of the reductions from $\FGMC{}$ for the classes of queries captured by our results, and the "FP"/"shP"-hard dichotomies that follow. The colors indicate the lemma(s) used for the proof: alternating diagonal stripes indicate that both lemmas are needed for the proof, while the vertical separation of "sjf-CQ" indicates two alternative ways to obtain the result.
  • Figure 2: Illustration of the construction of $\Ai$ on a "graph database", where "constants" and "facts" are depicted by vertices and arrows, respectively. Graphically disconnected parts do not share any constant except for those that appear in $\aC$. New "exogenous" facts are thick arrows.
  • Figure 3: Reproductions of patterns from finkDichotomiesQueriesNegation2016.

Theorems & Definitions (43)

  • Remark 3.1: Non-Boolean case
  • proposition 1: dalviDichotomyProbabilisticInference2012 and kenigDichotomyGeneralizedModel2021
  • proposition 2: amarilliUniformReliabilityUnbounded2023a
  • Claim 3.1
  • proof
  • Claim 3.2
  • proof
  • Claim 3.3
  • proof
  • lemma 1
  • ...and 33 more