Decomposing Global Feature Effects Based on Feature Interactions

Julia Herbinger; Marvin N. Wright; Thomas Nagler; Bernd Bischl; Giuseppe Casalicchio

Decomposing Global Feature Effects Based on Feature Interactions

Julia Herbinger, Marvin N. Wright, Thomas Nagler, Bernd Bischl, Giuseppe Casalicchio

TL;DR

The paper addresses aggregation bias in global feature effect plots when interactions are present by introducing GADGET, a framework that partitions the feature space into regions with minimized interaction-related heterogeneity. It establishes a local-decomposability axiom and shows that PD, ALE, and SD satisfy it, enabling regional explanations (GADGET-PD, GADGET-ALE, GADGET-SD) with additive regional main effects. A new permutation-based PINT procedure detects global feature interactions in a model-agnostic way, guiding the selection of interacting feature subsets. Empirical studies across simulations and real-world datasets (COMPAS, bikesharing) demonstrate GADGET’s ability to reduce interaction-related heterogeneity (R^2 Tot close to 0.9–0.99 in examples) and produce more faithful regional explanations, with high-dimensional extensions via filtering. The work highlights practical implications for model auditing, bias detection, and interpretable ML, while noting limitations such as higher-order interaction detection in SD without recalculation and the Rashomon effect across models.

Abstract

Global feature effect methods, such as partial dependence plots, provide an intelligible visualization of the expected marginal feature effect. However, such global feature effect methods can be misleading, as they do not represent local feature effects of single observations well when feature interactions are present. We formally introduce generalized additive decomposition of global effects (GADGET), which is a new framework based on recursive partitioning to find interpretable regions in the feature space such that the interaction-related heterogeneity of local feature effects is minimized. We provide a mathematical foundation of the framework and show that it is applicable to the most popular methods to visualize marginal feature effects, namely partial dependence, accumulated local effects, and Shapley additive explanations (SHAP) dependence. Furthermore, we introduce and validate a new permutation-based interaction detection procedure that is applicable to any feature effect method that fits into our proposed framework. We empirically evaluate the theoretical characteristics of the proposed methods based on various feature effect methods in different experimental settings. Moreover, we apply our introduced methodology to three real-world examples to showcase their usefulness.

Decomposing Global Feature Effects Based on Feature Interactions

TL;DR

Abstract

Paper Structure (95 sections, 7 theorems, 57 equations, 23 figures, 6 tables, 2 algorithms)

This paper contains 95 sections, 7 theorems, 57 equations, 23 figures, 6 tables, 2 algorithms.

Introduction
Contributions.
Background
General Notation
Definition of Interactions and the Functional ANOVA Decomposition
Feature Effect Methods
Partial Dependence.
ALE.
SHAP Dependence.
Quantification of Interaction Effects
Related Work
Related Work on Regional Effects.
Related Work on Detecting Feature Interactions.
Generalized Additive Decomposition of Global EffecTs (GADGET)
Measuring Interaction-Related Heterogeneity
...and 80 more sections

Key Result

Theorem 2

If the local feature effect function $h(x_j, \mathbf{x}_{-j}^{(i)})$ satisfies Axiom axiom:feat_rel, then the loss function $\mathcal{L}_j\left(\mathcal{A}_g, x_j\right)$ defined in Eq. eq:loss only depends on feature interactions between the feature $\mathbf{x}_j$ at $x_j$ and features in $-j$: The proof can be found in Appendix app:proof_theorem2.

Figures (23)

Figure 1: Left: ICE and global PD curves of feature hr (hour of the day) of the bikesharing data set james2022ISLR2. Right: ICE and regional PD curves of hr depending on feature workingday. The feature effect of hr on predicted bike rentals is different on working days compared to non-working days, which is due to aggregation not visible in the global feature effect plot (white curve on the left).
Figure 2: Mean-centered ICE and PD curves of feature $\mathbf{x}_1$ of the described simulation example. Left: Illustrates the distance $d^{(1)}$ that is calculated within Eq. \ref{['eq:loss']} between the local feature effect (here: $h(x_j,\mathbf{x}_{-j}^{(i)}) := \hat{f}^c(x_j,\mathbf{x}_{-j}^{(i)})$, i.e., the mean-centered ICE curve) and the expected effect within region $\mathcal{A}_g$ (here: $\mathds{E}[h (x_j, X_{-j})|\mathcal{X}] = \mathds{E}[\hat{f}^c (x_j, X_{-j})|\mathcal{X}]$, i.e. the global mean-centered PD) at the first grid point $\tilde{x}_1 = -1$ for the first observation. Middle: The distances $d$ are calculated for all mean-centered ICE curves at the first grid point and the squared values are summed up to obtain the loss function value for the first grid point as defined in Eq. \ref{['eq:loss']}. Right: The risk function value is calculated according to Eq. \ref{['eq:risk']} by aggregating the loss function values over the valid grid points. This measures the heterogeneity of the mean-centered ICE curves of feature $\mathbf{x}_1$, which quantifies the interaction-related heterogeneity between $\mathbf{x}_1$ and $\mathbf{x}_{-1}$ based on Theorem \ref{['theorem:interact_rel']}.
Figure 3: Visualization of applying GADGET with $S = Z = \{1,2,3\}$ to mean-centered ICE curves of the uncorrelated simulation example with $Y = 3X_1\mathbbm{1}_{X_3>0} - 3X_1\mathbbm{1}_{X_3\leq 0} + X_3 + \epsilon$ where $\epsilon \sim \mathbb{N}(0,0.09)$. Plots show mean-centered ICE and PD curves on the entire feature space (upper) and within regions after partitioning the feature space w.r.t. $\mathbf{x}_3 = -0.003$ (lower).
Figure 4: Local, global (grey), and regional effects for uncorrelated (left) and correlated (right) simulation settings. Local effects are colored w.r.t. the first split when GADGET is applied. The split feature is in all cases $\mathbf{x}_3$. The blue color represents the left and orange the right region according to the split point. The thicker lines represent the respective regional PD curves. The rug plot and the black points in the upper plots show the distribution of $\mathbf{x}_1$ according to the split point and the underlying observational values, respectively. The two upper ALE plots visualize the heterogeneity of local feature effects (derivatives) while the two lower plots show the mean-centered global and regional ALE-curves.
Figure 5: Boxplots showing the interaction-related heterogeneity reduction $I_z$ per split feature over $30$ repetitions when PD, ALE, or SD is used in GADGET. Columns refer to correlation $\rho_{13}$, rows refer to fitted ML model.
...and 18 more figures

Theorems & Definitions (7)

Theorem 2
Theorem 3
Theorem 4
Theorem 5
Theorem 6
Theorem 7
Theorem 8

Decomposing Global Feature Effects Based on Feature Interactions

TL;DR

Abstract

Decomposing Global Feature Effects Based on Feature Interactions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (23)

Theorems & Definitions (7)