Table of Contents
Fetching ...

Faithful Group Shapley Value

Kiljae Lee, Ziqi Liu, Weijing Tang, Yuan Zhang

TL;DR

This work tackles reliable group-level data valuation by addressing shell-company manipulation in existing Group Shapley Value methods. It defines Faithful Group Shapley Value (FGSV) as the sum of individual Shapley values within a group, and proves it uniquely satisfies a coherent set of faithfulness axioms that forbid value inflation via regrouping. The authors develop a fast, provably accurate approximation algorithm for FGSV that leverages a structured decomposition and variance-reduction techniques, offering better scalability than aggregating SVs. Empirically, FGSV yields faster convergence and lower approximation error across synthetic benchmarks and real-world tasks, including faithful copyright attribution for generative AI and faithful explainable AI on the Diabetes dataset. The results demonstrate FGSV’s practical impact in fair data compensation and robust interpretation, with open-source code to reproduce the experiments.

Abstract

Data Shapley is an important tool for data valuation, which quantifies the contribution of individual data points to machine learning models. In practice, group-level data valuation is desirable when data providers contribute data in batch. However, we identify that existing group-level extensions of Data Shapley are vulnerable to shell company attacks, where strategic group splitting can unfairly inflate valuations. We propose Faithful Group Shapley Value (FGSV) that uniquely defends against such attacks. Building on original mathematical insights, we develop a provably fast and accurate approximation algorithm for computing FGSV. Empirical experiments demonstrate that our algorithm significantly outperforms state-of-the-art methods in computational efficiency and approximation accuracy, while ensuring faithful group-level valuation.

Faithful Group Shapley Value

TL;DR

This work tackles reliable group-level data valuation by addressing shell-company manipulation in existing Group Shapley Value methods. It defines Faithful Group Shapley Value (FGSV) as the sum of individual Shapley values within a group, and proves it uniquely satisfies a coherent set of faithfulness axioms that forbid value inflation via regrouping. The authors develop a fast, provably accurate approximation algorithm for FGSV that leverages a structured decomposition and variance-reduction techniques, offering better scalability than aggregating SVs. Empirically, FGSV yields faster convergence and lower approximation error across synthetic benchmarks and real-world tasks, including faithful copyright attribution for generative AI and faithful explainable AI on the Diabetes dataset. The results demonstrate FGSV’s practical impact in fair data compensation and robust interpretation, with open-source code to reproduce the experiments.

Abstract

Data Shapley is an important tool for data valuation, which quantifies the contribution of individual data points to machine learning models. In practice, group-level data valuation is desirable when data providers contribute data in batch. However, we identify that existing group-level extensions of Data Shapley are vulnerable to shell company attacks, where strategic group splitting can unfairly inflate valuations. We propose Faithful Group Shapley Value (FGSV) that uniquely defends against such attacks. Building on original mathematical insights, we develop a provably fast and accurate approximation algorithm for computing FGSV. Empirical experiments demonstrate that our algorithm significantly outperforms state-of-the-art methods in computational efficiency and approximation accuracy, while ensuring faithful group-level valuation.

Paper Structure

This paper contains 27 sections, 7 theorems, 22 equations, 6 figures, 2 algorithms.

Key Result

Proposition 1

Suppose in $\mathcal{D} = \{z_1, \dots, z_n\}$ are an i.i.d. sample. Let $\bar{U}(s) = \mathbb{E}_{S \sim \mathcal{P}^s} [ U(S) ]$ denote the expected utility for data $S$ (which only depends on $s=|S|$). Now we split a group $S_k$ into two non-empty subgroups $S_k'$ and $S_k"$ (i.e., $S_k'\cup S_k" then

Figures (6)

  • Figure 1: Left: GSV; right: FGSV (our method). Vertical span: valuation. Group $A$ is fixed; group $B$ engages increasing degrees of shell company attack (left$\to$right). Detailed experimental set-up in Appendix \ref{['sec::exp_detail']}
  • Figure 2: Performance comparison in the SOU game. Top: Our method (FGSV) achieves the lowest AUCC across all problem sizes. Bottom: Our method costs the lowest runtime per iteration.
  • Figure 3: Comparison of SRS and FSRS for copyright attribution. (a) Example images generated using brand prompts. (b) Shapley Royalty Share (SRS, wang2024economic) based on GSV. (c) Faithful SRS (FSRS, our method) based on $\operatorname{FGSV}$. Blue bars: valuation under Scenario 1 (single group per brand); orange bars: valuation under Scenario 2 (Google/Sprite data each split into size-20/10 subgroups, colored in dark and light orange).
  • Figure 4: Comparison of GSV (top row) and FGSV (bottom row) in a regression task for explainable AI. Each column aggregates category-level values for a specific variable: sex (left), age (middle), and BMI (right). Shaded areas represent $\pm1$ standard deviation across 30 replications.
  • Figure 5: Empirical average runtime (in seconds) of $U(S)$ evaluation as a function of subset size $s=|S|$. Each curve represents the mean over 50 randomly sampled subsets of size $s$; shaded areas indicate $\pm1$ standard deviation.
  • ...and 1 more figures

Theorems & Definitions (11)

  • Proposition 1: Shell company attack
  • Definition 1: Group data valuation
  • Definition 2: Axioms for faithful group data valuation
  • Theorem 1
  • Lemma 1
  • Theorem 2
  • Definition 3: $(\epsilon, \delta)$-approximation
  • Definition 4: Deletion Stability
  • Theorem 3
  • Proposition 2
  • ...and 1 more