Fast Shapley Value Estimation: A Unified Approach
Borui Zhang, Baotong Tian, Wenzhao Zheng, Jie Zhou, Jiwen Lu
TL;DR
This paper tackles the computational intractability of Shapley value explanations for high-dimensional inputs by introducing a unified view of stochastic estimators as a linear transformation and proposing SimSHAP, a simple and fast amortized estimator. By framing semivalue, random order value, least squares value, and amortized methods within a single matrix-based, subset-sampling paradigm, the authors derive unbiased targets and an efficient training objective that enables a single forward pass to estimate explanations. Extensive experiments on tabular and image data demonstrate that SimSHAP achieves substantial speedups with accuracy comparable to or better than existing methods (e.g., KernelSHAP and FastSHAP), including qualitative and quantitative evaluations on CIFAR-10. The work provides a practical, scalable approach to Shapley-value explanations and offers a unified theoretical lens for understanding the connections among diverse estimation strategies, with limitations around sampling stability and choice of metric matrices.
Abstract
Shapley values have emerged as a widely accepted and trustworthy tool, grounded in theoretical axioms, for addressing challenges posed by black-box models like deep neural networks. However, computing Shapley values encounters exponential complexity as the number of features increases. Various approaches, including ApproSemivalue, KernelSHAP, and FastSHAP, have been explored to expedite the computation. In our analysis of existing approaches, we observe that stochastic estimators can be unified as a linear transformation of randomly summed values from feature subsets. Based on this, we investigate the possibility of designing simple amortized estimators and propose a straightforward and efficient one, SimSHAP, by eliminating redundant techniques. Extensive experiments conducted on tabular and image datasets validate the effectiveness of our SimSHAP, which significantly accelerates the computation of accurate Shapley values.
