Table of Contents
Fetching ...

Fast Shapley Value Estimation: A Unified Approach

Borui Zhang, Baotong Tian, Wenzhao Zheng, Jie Zhou, Jiwen Lu

TL;DR

This paper tackles the computational intractability of Shapley value explanations for high-dimensional inputs by introducing a unified view of stochastic estimators as a linear transformation and proposing SimSHAP, a simple and fast amortized estimator. By framing semivalue, random order value, least squares value, and amortized methods within a single matrix-based, subset-sampling paradigm, the authors derive unbiased targets and an efficient training objective that enables a single forward pass to estimate explanations. Extensive experiments on tabular and image data demonstrate that SimSHAP achieves substantial speedups with accuracy comparable to or better than existing methods (e.g., KernelSHAP and FastSHAP), including qualitative and quantitative evaluations on CIFAR-10. The work provides a practical, scalable approach to Shapley-value explanations and offers a unified theoretical lens for understanding the connections among diverse estimation strategies, with limitations around sampling stability and choice of metric matrices.

Abstract

Shapley values have emerged as a widely accepted and trustworthy tool, grounded in theoretical axioms, for addressing challenges posed by black-box models like deep neural networks. However, computing Shapley values encounters exponential complexity as the number of features increases. Various approaches, including ApproSemivalue, KernelSHAP, and FastSHAP, have been explored to expedite the computation. In our analysis of existing approaches, we observe that stochastic estimators can be unified as a linear transformation of randomly summed values from feature subsets. Based on this, we investigate the possibility of designing simple amortized estimators and propose a straightforward and efficient one, SimSHAP, by eliminating redundant techniques. Extensive experiments conducted on tabular and image datasets validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.

Fast Shapley Value Estimation: A Unified Approach

TL;DR

This paper tackles the computational intractability of Shapley value explanations for high-dimensional inputs by introducing a unified view of stochastic estimators as a linear transformation and proposing SimSHAP, a simple and fast amortized estimator. By framing semivalue, random order value, least squares value, and amortized methods within a single matrix-based, subset-sampling paradigm, the authors derive unbiased targets and an efficient training objective that enables a single forward pass to estimate explanations. Extensive experiments on tabular and image data demonstrate that SimSHAP achieves substantial speedups with accuracy comparable to or better than existing methods (e.g., KernelSHAP and FastSHAP), including qualitative and quantitative evaluations on CIFAR-10. The work provides a practical, scalable approach to Shapley-value explanations and offers a unified theoretical lens for understanding the connections among diverse estimation strategies, with limitations around sampling stability and choice of metric matrices.

Abstract

Shapley values have emerged as a widely accepted and trustworthy tool, grounded in theoretical axioms, for addressing challenges posed by black-box models like deep neural networks. However, computing Shapley values encounters exponential complexity as the number of features increases. Various approaches, including ApproSemivalue, KernelSHAP, and FastSHAP, have been explored to expedite the computation. In our analysis of existing approaches, we observe that stochastic estimators can be unified as a linear transformation of randomly summed values from feature subsets. Based on this, we investigate the possibility of designing simple amortized estimators and propose a straightforward and efficient one, SimSHAP, by eliminating redundant techniques. Extensive experiments conducted on tabular and image datasets validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
Paper Structure (49 sections, 1 theorem, 30 equations, 6 figures, 12 tables, 1 algorithm)

This paper contains 49 sections, 1 theorem, 30 equations, 6 figures, 12 tables, 1 algorithm.

Key Result

Proposition 1

The least squares value in equ:lsv is equivalent to the semivalue in equ:semivalue.

Figures (6)

  • Figure 1: (a) Existing stochastic estimators for Shapley values can be unified as a linear transformation of the values obtained from sampled subsets. (b) We propose SimSHAP, which achieves high efficiency and maintain competitive approximation accuracy.
  • Figure 2: Accuracy of SimSHAP estimation across tabular datasets.
  • Figure 3: Comparison of different methods on randomly-chosen images in CIFAR-10.
  • Figure 4: Mean Insertion and Deletion score curves for different methods on CIFAR-10 dataset
  • Figure 5: SimSHAP accuracy as a function of number of training samples with/out pair sampling in (a) Census (b) News (c) Bank dataset.
  • ...and 1 more figures

Theorems & Definitions (8)

  • Definition 1
  • Remark
  • Proposition 1
  • Remark
  • Definition 2
  • Remark
  • Definition 3
  • proof