Table of Contents
Fetching ...

KernelSHAP-IQ: Weighted Least-Square Optimization for Shapley Interactions

Fabian Fumagalli, Maximilian Muschalik, Patrick Kolpaczki, Eyke Hüllermeier, Barbara Hammer

TL;DR

This work extends the Shapley framework to higher-order interactions by showing that Shapley Interaction Index (SII) can be characterized as the solution to a weighted least-squares problem, enabling optimal $k$-additive approximations via $k$-SII. It provides rigorous results for the SV and pairwise SII, and introduces KernelSHAP-IQ, a practical, KernelSHAP-inspired method for estimating interactions with state-of-the-art performance. The authors also present a consistent variant that converges to SII and an inconsistent variant that can outperform baselines in low-budget settings, along with conjectures for higher orders supported by empirical evidence. The approach yields more informative local explanations by incorporating interactions, with broad potential for both model interpretability and data valuation in complex ML systems.

Abstract

The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and $k$-Shapley values ($k$-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.

KernelSHAP-IQ: Weighted Least-Square Optimization for Shapley Interactions

TL;DR

This work extends the Shapley framework to higher-order interactions by showing that Shapley Interaction Index (SII) can be characterized as the solution to a weighted least-squares problem, enabling optimal -additive approximations via -SII. It provides rigorous results for the SV and pairwise SII, and introduces KernelSHAP-IQ, a practical, KernelSHAP-inspired method for estimating interactions with state-of-the-art performance. The authors also present a consistent variant that converges to SII and an inconsistent variant that can outperform baselines in low-budget settings, along with conjectures for higher orders supported by empirical evidence. The approach yields more informative local explanations by incorporating interactions, with broad potential for both model interpretability and data valuation in complex ML systems.

Abstract

The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and -Shapley values (-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.
Paper Structure (50 sections, 8 theorems, 82 equations, 12 figures, 3 tables, 7 algorithms)

This paper contains 50 sections, 8 theorems, 82 equations, 12 figures, 3 tables, 7 algorithms.

Key Result

Proposition 3.2

If $k\geq 2$, then with weights $\lambda(k,\ell) := \sum_{r=1}^{\ell}\binom{\ell}{r} B_{k-r}$ and $\lambda(k,0) := 0$.

Figures (12)

  • Figure 1: Positive (red) and negative (blue) feature attributions (vertices) and interactions (edges) for a movie review excerpt provided to a sentiment analysis language model. The interaction of "never" and "forget" highly contributes to the positive sentiment.
  • Figure 2: Force plots as first and second order explanations showing positive (red) and negative (blue) feature attributions for a data point of the California Housing regression dataset Kelley.1997. The SVs show that a longitude of $-122.44$ and a latitude of $37.8$, positively influences the predicted property's price. Considering 2-SII feature interactions reveals that the positive influence of latit. vanishes and only the combination of longi. and latit. pointing to the exact location, San Francisco, impacts the property's price.
  • Figure 3: Links between SII, $k$-SII and the $k$-additive approximation $\hat{\nu}_k$. The SII captures the average contribution of $S$ to $\nu$, which constructs the $k$-additive interaction index $k$-SII, which is used for interpretation. Both are linked to $\hat{\nu}_k$, where SII yields an optimal approximation, which iteratively constructs $\hat{\nu}_k$.
  • Figure 4: Approximation quality of KernelSHAP-IQ (orange) and inconsistent KernelSHAP-IQ (yellow) compared to the permutation sampling (purple), SHAP-IQ (pink) and SVARM-IQ (blue) baselines for estimating SII values for the LM (left; $n=14,l\in\{1,2,3\}$) the bike rental dataset (center left, $n=12,l=2$), the california housing dataset (center right; $n=8,l=2$), and the SOUM (right; $n=20,l=2$). The shaded bands represent the standard error of the mean (SEM).
  • Figure 5: Runtime analysis of KernelSHAP-IQ and baseline algorithms for calculating $l=2$ SII scores for an example sentence with $n=14$ words and the LM. For each approximator we evaluate 10 independent runs. The shaded bands corresponds to the SEM.
  • ...and 7 more figures

Theorems & Definitions (25)

  • Definition 2.1: Shapley Interaction Index Grabisch.1999
  • Definition 2.2: Efficiency
  • Definition 2.3: $k$-Shapley Values Bord.2023
  • Definition 3.1: $k$-additive Approximation
  • Proposition 3.2: Iterative Approximation
  • Corollary 3.3
  • Corollary 3.4
  • Remark 3.5
  • Theorem 3.6: KernelSHAP
  • Theorem 3.7: KernelSHAP-IQ, $k=2$
  • ...and 15 more