Table of Contents
Fetching ...

Expected Shapley-Like Scores of Boolean Functions: Complexity and Applications to Probabilistic Databases

Pratik Karmakar, Mikaël Monet, Pierre Senellart, Stéphane Bressan

TL;DR

This paper introduces expected Shapley-like scores for Boolean functions under independent probabilities and proves that computing these scores is polynomial-time reducible to computing the expected value of the function, with equivalence in many natural function classes. It provides a concrete polynomial-time algorithm for the tractable case of deterministic decomposable circuits (dd-circuits) and extends the approach to probabilistic databases via provenance, demonstrated through a ProvSQL implementation and experiments on the TPC-H benchmark. The results yield a practical framework for fair contribution scoring in probabilistic query answering and provenance-aware explanations, linking power indices to standard probabilistic evaluation. This work thus enables tractable, provenance-aware explanations in probabilistic data management and lays groundwork for further approximation and randomized algorithms.

Abstract

Shapley values, originating in game theory and increasingly prominent in explainable AI, have been proposed to assess the contribution of facts in query answering over databases, along with other similar power indices such as Banzhaf values. In this work we adapt these Shapley-like scores to probabilistic settings, the objective being to compute their expected value. We show that the computations of expected Shapley values and of the expected values of Boolean functions are interreducible in polynomial time, thus obtaining the same tractability landscape. We investigate the specific tractable case where Boolean functions are represented as deterministic decomposable circuits, designing a polynomial-time algorithm for this setting. We present applications to probabilistic databases through database provenance, and an effective implementation of this algorithm within the ProvSQL system, which experimentally validates its feasibility over a standard benchmark.

Expected Shapley-Like Scores of Boolean Functions: Complexity and Applications to Probabilistic Databases

TL;DR

This paper introduces expected Shapley-like scores for Boolean functions under independent probabilities and proves that computing these scores is polynomial-time reducible to computing the expected value of the function, with equivalence in many natural function classes. It provides a concrete polynomial-time algorithm for the tractable case of deterministic decomposable circuits (dd-circuits) and extends the approach to probabilistic databases via provenance, demonstrated through a ProvSQL implementation and experiments on the TPC-H benchmark. The results yield a practical framework for fair contribution scoring in probabilistic query answering and provenance-aware explanations, linking power indices to standard probabilistic evaluation. This work thus enables tractable, provenance-aware explanations in probabilistic data management and lays groundwork for further approximation and randomized algorithms.

Abstract

Shapley values, originating in game theory and increasingly prominent in explainable AI, have been proposed to assess the contribution of facts in query answering over databases, along with other similar power indices such as Banzhaf values. In this work we adapt these Shapley-like scores to probabilistic settings, the objective being to compute their expected value. We show that the computations of expected Shapley values and of the expected values of Boolean functions are interreducible in polynomial time, thus obtaining the same tractability landscape. We investigate the specific tractable case where Boolean functions are represented as deterministic decomposable circuits, designing a polynomial-time algorithm for this setting. We present applications to probabilistic databases through database provenance, and an effective implementation of this algorithm within the ProvSQL system, which experimentally validates its feasibility over a standard benchmark.
Paper Structure (14 sections, 15 theorems, 8 equations, 3 tables, 1 algorithm)

This paper contains 14 sections, 15 theorems, 8 equations, 3 tables, 1 algorithm.

Key Result

theorem 1

We have $\mathsf{EScore}_c(\mathcal{F}) \leqslant_{\mathsf{P}} \mathsf{EV}(\mathcal{F})$ for any tractable coefficient function $c$ and any class $\mathcal{F}$ of Boolean functions.

Theorems & Definitions (19)

  • definition 1
  • theorem 1
  • lemma 1
  • lemma 2
  • lemma 3
  • proposition 1
  • Corollary 2
  • definition 2
  • proposition 2
  • Corollary 3
  • ...and 9 more