Table of Contents
Fetching ...

Disentangling Interactions and Dependencies in Feature Attribution

Gunnar König, Eric Günther, Ulrike von Luxburg

TL;DR

This work derives DIP, a new mathematical decomposition of individual feature importance scores that disentangles three components: the standalone contribution and the contributions stemming from interactions and dependencies, and proposes a new visualization of feature importance scores that clearly illustrates the different contributions.

Abstract

In explainable machine learning, global feature importance methods try to determine how much each individual feature contributes to predicting the target variable, resulting in one importance score for each feature. But often, predicting the target variable requires interactions between several features (such as in the XOR function), and features might have complex statistical dependencies that allow to partially replace one feature with another one. In commonly used feature importance scores these cooperative effects are conflated with the features' individual contributions, making them prone to misinterpretations. In this work, we derive DIP, a new mathematical decomposition of individual feature importance scores that disentangles three components: the standalone contribution and the contributions stemming from interactions and dependencies. We prove that the DIP decomposition is unique and show how it can be estimated in practice. Based on these results, we propose a new visualization of feature importance scores that clearly illustrates the different contributions.

Disentangling Interactions and Dependencies in Feature Attribution

TL;DR

This work derives DIP, a new mathematical decomposition of individual feature importance scores that disentangles three components: the standalone contribution and the contributions stemming from interactions and dependencies, and proposes a new visualization of feature importance scores that clearly illustrates the different contributions.

Abstract

In explainable machine learning, global feature importance methods try to determine how much each individual feature contributes to predicting the target variable, resulting in one importance score for each feature. But often, predicting the target variable requires interactions between several features (such as in the XOR function), and features might have complex statistical dependencies that allow to partially replace one feature with another one. In commonly used feature importance scores these cooperative effects are conflated with the features' individual contributions, making them prone to misinterpretations. In this work, we derive DIP, a new mathematical decomposition of individual feature importance scores that disentangles three components: the standalone contribution and the contributions stemming from interactions and dependencies. We prove that the DIP decomposition is unique and show how it can be estimated in practice. Based on these results, we propose a new visualization of feature importance scores that clearly illustrates the different contributions.

Paper Structure

This paper contains 61 sections, 8 theorems, 58 equations, 8 figures, 2 algorithms.

Key Result

Proposition 2

Let $(X,Y)\sim P$ be a DGP and $J\subseteq D$ a subset of features. If $X_J$ and $X_{\bar{J}}$ are independent and the $\mathcal{L}^2$-optimal predictor is a GGAM $g^*=g^*_J+g^*_{\bar{J}}$ in $X_J$ and $X_{\bar{J}}$, then $\Psi\left(J,\bar{J}\right) = 0.$

Figures (8)

  • Figure 1: Feature importance, old vs. new. Consider a model that predicts house prices using the features longitude, latitude, and ocean proximity. Left: Leave-One-Covariate-Out (LOCO) scores. Right: Our decomposition of the same scores (black) into each feature's standalone contribution (gray) and the contributions of interactions (green) and dependencies (purple). The arrows of the bars indicate whether the contribution is positive or negative; their values sum up to the LOCO scores.
  • Figure 2: Examples. For each example, we show a forceplot visualizing the DIP decomposition into standalone contributions ($v(1)$ and $v(2)$), main effect dependencies ($\mathrm{Dep}(1,2)$) and interaction suplus, where the direction of each bar (upward or downward) represents the sign. They sum up to $v(1,2)$ (black horizontal line). The slim bars (right) show the decomposition of $\mathrm{Dep}(1,2)$ (purple horizontal line) into covariance and cross-predictability. For Examples \ref{['Example: student redundancy']}-\ref{['example: binary interaction']}, we additionally show heatmaps visualizing the distribution (top) and $g^*$ or $h^*$ (bottom).
  • Figure 3: Applications. We decompose the LOCO scores on the wine quality dataset (left) and the California Housing dataset (right) into each feature's standalone contribution, the interaction surplus, and the contribution of main effect dependencies.
  • Figure 4: For each example, we show a forceplot visualizing the DIP decomposition into standalone contributions ($v(1)$ and $v(2)$), main effect dependencies ($\mathrm{Dep}(1,2)$) and interaction suplus, where the direction of each bar (upward or downward) represents the sign. They sum up to $v(1,2)$ (black horizontal line). The slim bars (right) show the decomposition of $\mathrm{Dep}(1,2)$ (purple horizontal line) into covariance and cross-predictability.
  • Figure 5: Visualizing the data generating process of the three illustrative student examples.
  • ...and 3 more figures

Theorems & Definitions (20)

  • Definition 1: Cooperative Impact
  • Proposition 2: Without Interactions or Dependencies, the Cooperative Effect is Zero
  • Example 3: Contributions of Interactions and Dependencies Cancel Out
  • Definition 4: Pure Interaction
  • Theorem 5: Equivalent Characterization of Pure Interactions
  • Theorem 6: Cooperative Impact Decomposition
  • Example 7: Negative Cooperative Impact via Dependence, Figure \ref{['fig: student redundancy']}
  • Example 8: Positive Cooperative Impact via Dependence, Figure \ref{['fig: student enhancement']}
  • Example 9: Interactions, Figure \ref{['fig: binary interaction']}
  • Lemma 10: Equivalence of Orthogonality and Non-approximability
  • ...and 10 more