Table of Contents
Fetching ...

Model agnostic local variable importance for locally dependent relationships

Kelvyn K. Bladen, Adele Cutler, D. Richard Cutler, Kevin R. Moon

TL;DR

CLIQUE addresses the limitation that traditional local-importance methods often misattribute importance in locally dependent regions and struggle with multi-class problems. It converts global permutation ideas into a local, model-agnostic framework by quantifying changes in cross-validated prediction errors under quantile-grid perturbations, yielding local importances $V_{ij}$ for each observation and feature within $M$ perturbations. The approach robustly captures conditional, location-dependent relationships and extends naturally to multi-class classification, with experiments showing reduced bias in regions where a variable has no effect and clear, interpretable signals where interactions exist. Across simulated and real datasets (including lichen ecology and MNIST), CLIQUE outperforms SHAP and LIME in detecting locally dependent information and demonstrates favorable scaling and practical interpretability for downstream decision-making.

Abstract

Global variable importance measures are commonly used to interpret the results of machine learning models. Local variable importance techniques assess how variables contribute to individual observations. Current methods typically fail to accurately reflect locally dependent relationships between variables and instead focus on marginal importance values. Additionally, they are not natively adapted for multi-class classification problems. We propose a new model-agnostic method for calculating local variable importance, CLIQUE, that captures locally dependent relationships, improves over permutation-based methods, and can be directly applied to multi-category classification problems. Simulated and real-world examples show that CLIQUE emphasizes locally dependent information and properly reduces bias in regions where variables do not affect the response.

Model agnostic local variable importance for locally dependent relationships

TL;DR

CLIQUE addresses the limitation that traditional local-importance methods often misattribute importance in locally dependent regions and struggle with multi-class problems. It converts global permutation ideas into a local, model-agnostic framework by quantifying changes in cross-validated prediction errors under quantile-grid perturbations, yielding local importances for each observation and feature within perturbations. The approach robustly captures conditional, location-dependent relationships and extends naturally to multi-class classification, with experiments showing reduced bias in regions where a variable has no effect and clear, interpretable signals where interactions exist. Across simulated and real datasets (including lichen ecology and MNIST), CLIQUE outperforms SHAP and LIME in detecting locally dependent information and demonstrates favorable scaling and practical interpretability for downstream decision-making.

Abstract

Global variable importance measures are commonly used to interpret the results of machine learning models. Local variable importance techniques assess how variables contribute to individual observations. Current methods typically fail to accurately reflect locally dependent relationships between variables and instead focus on marginal importance values. Additionally, they are not natively adapted for multi-class classification problems. We propose a new model-agnostic method for calculating local variable importance, CLIQUE, that captures locally dependent relationships, improves over permutation-based methods, and can be directly applied to multi-category classification problems. Simulated and real-world examples show that CLIQUE emphasizes locally dependent information and properly reduces bias in regions where variables do not affect the response.

Paper Structure

This paper contains 15 sections, 4 equations, 23 figures, 1 algorithm.

Figures (23)

  • Figure 1: Scatterplot of the AND gate data colored by the label $y$ as computed in Equation \ref{['eq:xor2']}. When $v_2<-1/3$, the value of $v_1$ should have no impact on the output of the trained model and should thus have no importance.
  • Figure 2: Scatterplots of local variable importances vs. $v_1$ values from the AND gate data experiment. Each plot is colored by whether $v_2>-1/3$. CLIQUE and CLIP values output an importance of zero when $v_2<-1/3$ and a nonzero importance otherwise. LIME values fail to distinguish between these two regions and SHAP values are nonzero when $v_2<-1/3$.
  • Figure 3: Scatterplot of the Corners data colored by the label $y$ (see Equation \ref{['eq:2corn']}). $v_2$ is not important when $v_1<0$. $v_1$ is not important when $|v_2|<1/4$.
  • Figure 4: Scatterplots of local variable importances vs. $v_1$ values from the Corners data experiment. Each plot is colored by whether $|v_2|>1/4$. As in Figure \ref{['fig:xor2patch']}, the CLIQUE and CLIP values capture the known conditional relationships between $v_1$ and $v_2$, while SHAP and LIME largely fail.
  • Figure 5: Scatterplots of local variable importances vs. $v_2$ values from the Corners data experiment. Each plot is colored by whether $v_1>0$. As in Figure \ref{['fig:corner2gbm1']}, the CLIQUE and CLIP values capture the known conditional relationships between $v_1$ and $v_2$, while SHAP and LIME largely fail.
  • ...and 18 more figures