Table of Contents
Fetching ...

A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values

Tyler Chen, Akshay Seshadri, Mattia J. Villani, Pradeep Niroula, Shouvanik Chakrabarti, Archan Ray, Pranav Deshpande, Romina Yalovetzky, Marco Pistoia, Niraj Kumar

TL;DR

The paper tackles the challenge of efficiently estimating Shapley values for model explanations by introducing a unified, provable framework that reframes Shapley estimation as sketched regression or approximate matrix-vector multiplication. It provides non-asymptotic guarantees for a broad class of estimators, including KernelSHAP and LeverageSHAP, and analyzes multiple sampling schemes (kernel weights, leverage scores, and their interpolations) to derive explicit sample-complexity bounds. Key theoretical contributions include a change-of-variables approach that yields unconstrained regression formulations with universal guarantees and a comparison of estimators in terms of the common quantity $\

Abstract

Shapley values have emerged as a critical tool for explaining which features impact the decisions made by machine learning models. However, computing exact Shapley values is difficult, generally requiring an exponential (in the feature dimension) number of model evaluations. To address this, many model-agnostic randomized estimators have been developed, the most influential and widely used being the KernelSHAP method (Lundberg & Lee, 2017). While related estimators such as unbiased KernelSHAP (Covert & Lee, 2021) and LeverageSHAP (Musco & Witter, 2025) are known to satisfy theoretical guarantees, bounds for KernelSHAP have remained elusive. We describe a broad and unified framework that encompasses KernelSHAP and related estimators constructed using both with and without replacement sampling strategies. We then prove strong non-asymptotic theoretical guarantees that apply to all estimators from our framework. This provides, to the best of our knowledge, the first theoretical guarantees for KernelSHAP and sheds further light on tradeoffs between existing estimators. Through comprehensive benchmarking on small and medium dimensional datasets for Decision-Tree models, we validate our approach against exact Shapley values, consistently achieving low mean squared error with modest sample sizes. Furthermore, we make specific implementation improvements to enable scalability of our methods to high-dimensional datasets. Our methods, tested on datasets such MNIST and CIFAR10, provide consistently better results compared to the KernelSHAP library.

A Unified Framework for Provably Efficient Algorithms to Estimate Shapley Values

TL;DR

The paper tackles the challenge of efficiently estimating Shapley values for model explanations by introducing a unified, provable framework that reframes Shapley estimation as sketched regression or approximate matrix-vector multiplication. It provides non-asymptotic guarantees for a broad class of estimators, including KernelSHAP and LeverageSHAP, and analyzes multiple sampling schemes (kernel weights, leverage scores, and their interpolations) to derive explicit sample-complexity bounds. Key theoretical contributions include a change-of-variables approach that yields unconstrained regression formulations with universal guarantees and a comparison of estimators in terms of the common quantity $\

Abstract

Shapley values have emerged as a critical tool for explaining which features impact the decisions made by machine learning models. However, computing exact Shapley values is difficult, generally requiring an exponential (in the feature dimension) number of model evaluations. To address this, many model-agnostic randomized estimators have been developed, the most influential and widely used being the KernelSHAP method (Lundberg & Lee, 2017). While related estimators such as unbiased KernelSHAP (Covert & Lee, 2021) and LeverageSHAP (Musco & Witter, 2025) are known to satisfy theoretical guarantees, bounds for KernelSHAP have remained elusive. We describe a broad and unified framework that encompasses KernelSHAP and related estimators constructed using both with and without replacement sampling strategies. We then prove strong non-asymptotic theoretical guarantees that apply to all estimators from our framework. This provides, to the best of our knowledge, the first theoretical guarantees for KernelSHAP and sheds further light on tradeoffs between existing estimators. Through comprehensive benchmarking on small and medium dimensional datasets for Decision-Tree models, we validate our approach against exact Shapley values, consistently achieving low mean squared error with modest sample sizes. Furthermore, we make specific implementation improvements to enable scalability of our methods to high-dimensional datasets. Our methods, tested on datasets such MNIST and CIFAR10, provide consistently better results compared to the KernelSHAP library.

Paper Structure

This paper contains 49 sections, 17 theorems, 151 equations, 7 figures, 7 tables, 2 algorithms.

Key Result

Theorem 2.1

Let $\bm{Q}$ be any fixed $d\times (d-1)$ matrix whose columns form an orthonormal basis for the space of vectors orthogonal to the all-ones vector (i.e. $\bm{Q}^\mathsf{T}\bm{Q} = \bm{I}$, $\bm{Q}^\mathsf{T}\bm{1} = \bm{0}$). Given $\lambda\in\mathbb{R}$, define Then, $\bm{U}^\mathsf{T}\bm{U} = \bm{I}$ and

Figures (7)

  • Figure 1: Comparison of the sampling probabilities described in \ref{['sec:importance']}. Kernel Weights (dashed), Leverage scores (dash-dot), and our proposed modified $\ell_2$-weights (solid), which are the geometric mean of the Kernel Weights and Leverage scores.
  • Figure 2: Comparison of performance across different estimators. In (1, top row) estimators use with replacement sampling strategies. In (2,3, central and bottom row) $\bm{SZ}$ is sampled without replacement. In legends, MV refers to matrix-vector multiplication estimator and LS to regression (least squares) estimator. Dimensions of each datasets are reported with the titles.
  • Figure 3: Comparison of estimators in image datasets: MNIST (top row) and CIFAR (bottom row). In the first column, (1, left column) performance of estimators is measured with mean squared error (normalized) from true Shapley value and time (in seconds). (2, center column) Area under the curve (AUC) calculation for insertion (x-axis) and deletion curves (y-axis) have been provided, computed on the top 100 features; reported as percentage under the curve. (3, right column) Spearman rank correlation for increasing number of samples.
  • Figure 4: Ratio of mean-squared errors (MSE) as a function of the dimension for different sampling strategies for the adversarial model in \ref{['app:adversarial_example']} (computed analytically from expressions for $\gamma$). The matrix-vector multiplication estimator and regression estimator have (almost) the same MSE ratio for this model (see \ref{['cor:adversarial_mse_ratio']}). For $\ell_2$-squared v/s kernel (solid) and modified $\ell_2$ vs kernel (dashed), kernel weights give an advantage by a factor of $\tilde{O}(d)$ and $\tilde{O}(\sqrt{d})$ respectively. On the other hand, for modified $\ell_2$ v/s $\ell_2$-squared (long dashed), modified $\ell_2$ outperforms $\ell_2$-squared by a factor of $O(\sqrt{d})$.
  • Figure 5: The unified framework for estimating Shapley values with the proposed class of estimators. First, we define a distribution to apply to each bucket (i.e., to the selection of the bit vector to select - $p_i$ is the probability of sampling an item from bucket/coalition of size (or bit vector with Hamming weight) $i\in[d]$. Then we select a sampling strategy (with or without replacement). Finally, we select the estimation strategy. If we limit ourselves to $\ell_2$-squared and modified, and kernel distribution, this provides a total of $3 \times 2 \times 2 = 12$ estimators.
  • ...and 2 more figures

Theorems & Definitions (38)

  • Theorem 2.1
  • Theorem 2.2
  • Theorem A.1: Matrix-Vector multiplication
  • proof
  • Remark A.2
  • Theorem A.4: Subspace embedding
  • proof
  • Theorem A.5: Sketched Regression
  • proof
  • Lemma A.6
  • ...and 28 more