Stabilizing Estimates of Shapley Values with Control Variates
Jeremy Goldwasser, Giles Hooker
TL;DR
This work tackles the instability of Shapley-value explanations caused by Monte Carlo sampling by introducing ControlSHAP, a general, model-agnostic variance-reduction technique based on control variates. It leverages a correlated, tractable Taylor-approximation of the model to form a control variate whose optimal coefficient minimizes variance, yielding substantial reductions in Shapley-value variability (up to $90\%$ in some cases) with minimal extra computation. The approach applies to both independent and correlated feature settings and to differentiable or non-differentiable models (via finite-difference gradients), and it can be combined with Shapley Sampling or KernelSHAP. Empirical results on five high-dimensional datasets show improved stability in Shapley estimates and rankings, along with the ability to estimate anticipated variance reductions from observed correlations, enabling faster convergence and more trustworthy explanations in practice.
Abstract
Shapley values are among the most popular tools for explaining predictions of blackbox machine learning models. However, their high computational cost motivates the use of sampling approximations, inducing a considerable degree of uncertainty. To stabilize these model explanations, we propose ControlSHAP, an approach based on the Monte Carlo technique of control variates. Our methodology is applicable to any machine learning model and requires virtually no extra computation or modeling effort. On several high-dimensional datasets, we find it can produce dramatic reductions in the Monte Carlo variability of Shapley estimates.
