Table of Contents
Fetching ...

Accurate estimation of feature importance faithfulness for tree models

Mateusz Gajewski, Adam Karczmarz, Mateusz Rapicki, Piotr Sankowski

TL;DR

This work introduces $PGI^2$, a perturbation-based faithfulness metric for feature rankings in tree ensembles, and shows that the squared prediction gap $\mathrm{PG}^2$ can be computed exactly in $O(n^2)$ time for given perturbation distributions. It also proposes a $\mathrm{PG}^2$-based greedy feature ordering to rank features, and compares its faithfulness to SHAP across multiple datasets. Through experiments, the authors demonstrate that the exact PG^2 computation is numerically stable and can outperform Monte Carlo approaches under tight time budgets, while the PG^2-based ranking often yields higher $\overline{\mathrm{PGI}^2}$ than SHAP on bigger models, with SHAP sometimes excelling on remove-and-retrain metrics. The results suggest that $\mathrm{PG}^2$ offers a principled, efficient alternative for measuring and utilizing feature importance in tree-based models with potential impact on explainability benchmarks and practical deployments.

Abstract

In this paper, we consider a perturbation-based metric of predictive faithfulness of feature rankings (or attributions) that we call PGI squared. When applied to decision tree-based regression models, the metric can be computed accurately and efficiently for arbitrary independent feature perturbation distributions. In particular, the computation does not involve Monte Carlo sampling that has been typically used for computing similar metrics and which is inherently prone to inaccuracies. Moreover, we propose a method of ranking features by their importance for the tree model's predictions based on PGI squared. Our experiments indicate that in some respects, the method may identify the globally important features better than the state-of-the-art SHAP explainer

Accurate estimation of feature importance faithfulness for tree models

TL;DR

This work introduces , a perturbation-based faithfulness metric for feature rankings in tree ensembles, and shows that the squared prediction gap can be computed exactly in time for given perturbation distributions. It also proposes a -based greedy feature ordering to rank features, and compares its faithfulness to SHAP across multiple datasets. Through experiments, the authors demonstrate that the exact PG^2 computation is numerically stable and can outperform Monte Carlo approaches under tight time budgets, while the PG^2-based ranking often yields higher than SHAP on bigger models, with SHAP sometimes excelling on remove-and-retrain metrics. The results suggest that offers a principled, efficient alternative for measuring and utilizing feature importance in tree-based models with potential impact on explainability benchmarks and practical deployments.

Abstract

In this paper, we consider a perturbation-based metric of predictive faithfulness of feature rankings (or attributions) that we call PGI squared. When applied to decision tree-based regression models, the metric can be computed accurately and efficiently for arbitrary independent feature perturbation distributions. In particular, the computation does not involve Monte Carlo sampling that has been typically used for computing similar metrics and which is inherently prone to inaccuracies. Moreover, we propose a method of ranking features by their importance for the tree model's predictions based on PGI squared. Our experiments indicate that in some respects, the method may identify the globally important features better than the state-of-the-art SHAP explainer
Paper Structure (27 sections, 1 theorem, 9 equations, 7 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 1 theorem, 9 equations, 7 figures, 6 tables, 1 algorithm.

Key Result

Theorem 3.1

Let $f$ be a tree ensemble model whose trees have $n$ nodes in total. Let $x\in \mathbb{R}^d$. Then for any $S\subseteq [d]$, $\mathrm{PG}^2(x,S)$ can be computed in $O(n^2)$ time.

Figures (7)

  • Figure 1: NMAE for $\sigma = 0.3$ and for Bigger model for Red Wine Quality dataset.
  • Figure 2: Single tree model for Wine Quality dataset
  • Figure 3: Bigger model for Wine Quality dataset
  • Figure 4: Single tree model for Californian Housing dataset
  • Figure 5: Bigger model for Californian Housing dataset
  • ...and 2 more figures

Theorems & Definitions (1)

  • Theorem 3.1