Table of Contents
Fetching ...

Improving the Weighting Strategy in KernelSHAP

Lars Henry Berge Olsen, Martin Jullum

TL;DR

The paper tackles the high computational cost of conditional Shapley-value explanations with KernelSHAP by introducing deterministic weighting via paired c-kernel and by refining the PySHAP approach (PySHAP*), including PySHAP* c-kernel. It presents a detailed taxonomy of sampling and weighting strategies, derives deterministic weight corrections, and demonstrates improved efficiency (5–50% fewer coalitions, up to ~95% in some regimes) with preserved accuracy across simulated Gaussian data and a real-world Red Wine dataset. Across XGBoost and linear-model scenarios, the paired c-kernel and PySHAP* c-kernel methods consistently outperform existing strategies, reducing variance and stabilizing weights, especially when many features are present. The work provides practical guidance for scalable Shapley-value explanations in high-dimensional tabular data and suggests avenues for extending these ideas to other Shapley-based explanations and theoretical analyses.

Abstract

In Explainable AI (XAI), Shapley values are a popular model-agnostic framework for explaining predictions made by complex machine learning models. The computation of Shapley values requires estimating non-trivial contribution functions representing predictions with only a subset of the features present. As the number of these terms grows exponentially with the number of features, computational costs escalate rapidly, creating a pressing need for efficient and accurate approximation methods. For tabular data, the KernelSHAP framework is considered the state-of-the-art model-agnostic approximation framework. KernelSHAP approximates the Shapley values using a weighted sample of the contribution functions for different feature subsets. We propose a novel modification of KernelSHAP which replaces the stochastic weights with deterministic ones to reduce the variance of the resulting Shapley value approximations. This may also be combined with our simple, yet effective modification to the KernelSHAP variant implemented in the popular Python library SHAP. Additionally, we provide an overview of established methods. Numerical experiments demonstrate that our methods can reduce the required number of contribution function evaluations by $5\%$ to $50\%$ while preserving the same accuracy of the approximated Shapley values -- essentially reducing the running time by up to $50\%$. These computational advancements push the boundaries of the feature dimensionality and number of predictions that can be accurately explained with Shapley values within a feasible runtime.

Improving the Weighting Strategy in KernelSHAP

TL;DR

The paper tackles the high computational cost of conditional Shapley-value explanations with KernelSHAP by introducing deterministic weighting via paired c-kernel and by refining the PySHAP approach (PySHAP*), including PySHAP* c-kernel. It presents a detailed taxonomy of sampling and weighting strategies, derives deterministic weight corrections, and demonstrates improved efficiency (5–50% fewer coalitions, up to ~95% in some regimes) with preserved accuracy across simulated Gaussian data and a real-world Red Wine dataset. Across XGBoost and linear-model scenarios, the paired c-kernel and PySHAP* c-kernel methods consistently outperform existing strategies, reducing variance and stabilizing weights, especially when many features are present. The work provides practical guidance for scalable Shapley-value explanations in high-dimensional tabular data and suggests avenues for extending these ideas to other Shapley-based explanations and theoretical analyses.

Abstract

In Explainable AI (XAI), Shapley values are a popular model-agnostic framework for explaining predictions made by complex machine learning models. The computation of Shapley values requires estimating non-trivial contribution functions representing predictions with only a subset of the features present. As the number of these terms grows exponentially with the number of features, computational costs escalate rapidly, creating a pressing need for efficient and accurate approximation methods. For tabular data, the KernelSHAP framework is considered the state-of-the-art model-agnostic approximation framework. KernelSHAP approximates the Shapley values using a weighted sample of the contribution functions for different feature subsets. We propose a novel modification of KernelSHAP which replaces the stochastic weights with deterministic ones to reduce the variance of the resulting Shapley value approximations. This may also be combined with our simple, yet effective modification to the KernelSHAP variant implemented in the popular Python library SHAP. Additionally, we provide an overview of established methods. Numerical experiments demonstrate that our methods can reduce the required number of contribution function evaluations by to while preserving the same accuracy of the approximated Shapley values -- essentially reducing the running time by up to . These computational advancements push the boundaries of the feature dimensionality and number of predictions that can be accurately explained with Shapley values within a feasible runtime.
Paper Structure (19 sections, 15 equations, 8 figures, 1 table)

This paper contains 19 sections, 15 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: The normalized weights $w_{\mathcal{S}}$ used in \ref{['eq:ShapleyValuesDefWLSSolution_approx']} by the strategies in \ref{['Sampling_strategies']} for different number of unique coalitions $N_\text{coal}$ in an $M = 10$-dimensional setting. The paired-based strategies are symmetric around the vertical line. The paired c-kernel and PySHAP* c-kernel strategies both have identical weights within each coalition size, but their weights are slightly different from each other.
  • Figure 2: XGBoost experiment: $\operatorname{MAE} = \overline{\operatorname{MAE}}_{500}({\boldsymbol{\phi}}, {\boldsymbol{\phi}}_{\mathcal{D}})$ for different number of coalitions $N_\text{coal}$ and dependencies levels $\rho$ on log-scale together with $95\%$ confidence bands.
  • Figure 3: XGBoost experiment: histograms of the predicted responses $f(\boldsymbol{x}^*)$ for the $1000$ explicands together with $\phi_0 = \mathbb{E}[f(\boldsymbol{x})] = \overline{y}_\text{train}$ for each dependence level.
  • Figure 4: The reduction in $N_\text{coal}$ needed by the PySHAP*c-kernel strategy to obtain the same $\operatorname{MAE}$ as the other strategies. E.g., for the experiment in \ref{['Simulation:xgboost']} with $\rho = 0.2$ and $N_\text{coal} = 500$, the PySHAP* c-kernel strategy obtains the same $\operatorname{MAE}$ score as the PySHAP strategy using only $0.5\times500 = 250$ coalitions, i.e., a $50\%$ reduction. In general, we see similar curves for the different experiments, and all strategies perform significantly worse than the PySHAP* c-kernel strategy. The exception is the paired c-kernel strategy in a small region in the top left figure, where it has a fraction above one, indicating that PySHAP* c-kernel requires a larger $N_\text{coal}$ than paired c-kernel.
  • Figure 5: Linear experiment: $\operatorname{MAE} = \overline{\operatorname{MAE}}_{150}({\boldsymbol{\phi}}, {\boldsymbol{\phi}}_{\mathcal{D}})$ curves for each strategy and dependence level $\rho$ on log-scale together with $95\%$ confidence bands, which are very narrow.
  • ...and 3 more figures