Table of Contents
Fetching ...

One Sample Fits All: Approximating All Probabilistic Values Simultaneously and Efficiently

Weida Li, Yaoliang Yu

TL;DR

This work proposes a one-sample-fits-all framework parameterized by a sampling vector to approximate intermediate terms that can be converted to any probabilistic value without amplifying scalars, and theoretically identifies a key formula that effectively determines the convergence rate of the framework.

Abstract

The concept of probabilistic values, such as Beta Shapley values and weighted Banzhaf values, has gained recent attention in applications like feature attribution and data valuation. However, exact computation of these values is often exponentially expensive, necessitating approximation techniques. Prior research has shown that the choice of probabilistic values significantly impacts downstream performance, with no universally superior option. Consequently, one may have to approximate multiple candidates and select the best-performing one. Although there have been many efforts to develop efficient estimators, none are intended to approximate all probabilistic values both simultaneously and efficiently. In this work, we embark on the first exploration of achieving this goal. Adhering to the principle of maximum sample reuse, we propose a one-sample-fits-all framework parameterized by a sampling vector to approximate intermediate terms that can be converted to any probabilistic value without amplifying scalars. Leveraging the concept of $ (ε, δ) $-approximation, we theoretically identify a key formula that effectively determines the convergence rate of our framework. By optimizing the sampling vector using this formula, we obtain i) a one-for-all estimator that achieves the currently best time complexity for all probabilistic values on average, and ii) a faster generic estimator with the sampling vector optimally tuned for each probabilistic value. Particularly, our one-for-all estimator achieves the fastest convergence rate on Beta Shapley values, including the well-known Shapley value, both theoretically and empirically. Finally, we establish a connection between probabilistic values and the least square regression used in (regularized) datamodels, showing that our one-for-all estimator can solve a family of datamodels simultaneously.

One Sample Fits All: Approximating All Probabilistic Values Simultaneously and Efficiently

TL;DR

This work proposes a one-sample-fits-all framework parameterized by a sampling vector to approximate intermediate terms that can be converted to any probabilistic value without amplifying scalars, and theoretically identifies a key formula that effectively determines the convergence rate of the framework.

Abstract

The concept of probabilistic values, such as Beta Shapley values and weighted Banzhaf values, has gained recent attention in applications like feature attribution and data valuation. However, exact computation of these values is often exponentially expensive, necessitating approximation techniques. Prior research has shown that the choice of probabilistic values significantly impacts downstream performance, with no universally superior option. Consequently, one may have to approximate multiple candidates and select the best-performing one. Although there have been many efforts to develop efficient estimators, none are intended to approximate all probabilistic values both simultaneously and efficiently. In this work, we embark on the first exploration of achieving this goal. Adhering to the principle of maximum sample reuse, we propose a one-sample-fits-all framework parameterized by a sampling vector to approximate intermediate terms that can be converted to any probabilistic value without amplifying scalars. Leveraging the concept of -approximation, we theoretically identify a key formula that effectively determines the convergence rate of our framework. By optimizing the sampling vector using this formula, we obtain i) a one-for-all estimator that achieves the currently best time complexity for all probabilistic values on average, and ii) a faster generic estimator with the sampling vector optimally tuned for each probabilistic value. Particularly, our one-for-all estimator achieves the fastest convergence rate on Beta Shapley values, including the well-known Shapley value, both theoretically and empirically. Finally, we establish a connection between probabilistic values and the least square regression used in (regularized) datamodels, showing that our one-for-all estimator can solve a family of datamodels simultaneously.

Paper Structure

This paper contains 33 sections, 17 theorems, 111 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Theorem 1

Assume i) $\| U \|_{\infty} \leq u$ and ii) $0<\epsilon\leq \sqrt{2D(\mathbf{m}, \mathbf{q})\gamma(\mathbf{q})^{2}u^{2}}$. For $\hat{\boldsymbol\phi}$ in Algorithm alg:ofa, it requires $\frac{4nu^{2}D(\mathbf{m}, \mathbf{q})}{\epsilon^{2}}\log\frac{8n^{2}}{\delta}$ evaluations of $U$ to achieve $P(\

Figures (4)

  • Figure 1: Comparison of ten one-for-all estimators. Beta$(\alpha, \beta)$ denotes Beta Shapley values, whereas WB-$a$ refers to weighted Banzhaf values. Our OFA-S estimator is equal to the OFA-A estimator for the Shapley value. The suffix "Shapley" indicates that there is no reweighting for the Shapley value, while "Banzhaf" stands for the Banzhaf value. The permutation estimator is originally proposed for the Shapley value. The utility function $U$ is the cross-entropy loss of LeNet trained on $24$ data from FMNIST. All the results are averaged using $30$ random seeds.
  • Figure 2: Comparison of one-for-all estimators using six utility functions. All the AUCCs are reported with standard deviation using $30$ random seeds. Smaller AUCC indicates faster convergence rate.
  • Figure 3: Comparison of twelve estimators using six utility functions. All the AUCCs are reported with standard deviation using $30$ random seeds. Smaller AUCC indicates faster convergence rate.
  • Figure : The One-Sample-Fits-All (OFA) Framework

Theorems & Definitions (20)

  • Definition 1
  • Theorem 1
  • Proposition 1
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Theorem 2
  • Proposition 5: marichal2011weighted
  • Corollary 1
  • Theorem 2
  • ...and 10 more