Implementing local-explainability in Gradient Boosting Trees: Feature Contribution

Ángel Delgado-Panadero; Beatriz Hernández-Lorca; María Teresa García-Ordás; José Alberto Benítez-Andrades

Implementing local-explainability in Gradient Boosting Trees: Feature Contribution

Ángel Delgado-Panadero, Beatriz Hernández-Lorca, María Teresa García-Ordás, José Alberto Benítez-Andrades

TL;DR

This work addresses the lack of intrinsic local explainability for Gradient Boosting Decision Trees (GBDT) by introducing Decision Contribution and Feature Contribution, which decompose a prediction into the sequence of per-node residues along the tree paths. The authors prove that node contributions correspond to changes in the conditional distribution of the target and show that ensemble predictions can be written as a sum of these per-decision contributions $F^t(x) = \sum_{l=0}^t \sum_{j=0}^{i(l)} \alpha g^l(s_j)$. Empirically, the method yields explanations that reflect the internal tree decisions, remains stable under feature correlation and noise, and exhibits comparable behavior to SHAP with better sensitivity to outliers, while Lime diverges due to linearity assumptions. The approach provides a practical, intrinsic, local explainability mechanism for GBDT that supports GDPR-like explainability needs and offers a pathway to extending similar explanations to other tree-based models.

Abstract

Gradient Boost Decision Trees (GBDT) is a powerful additive model based on tree ensembles. Its nature makes GBDT a black-box model even though there are multiple explainable artificial intelligence (XAI) models obtaining information by reinterpreting the model globally and locally. Each tree of the ensemble is a transparent model itself but the final outcome is the result of a sum of these trees and it is not easy to clarify. In this paper, a feature contribution method for GBDT is developed. The proposed method takes advantage of the GBDT architecture to calculate the contribution of each feature using the residue of each node. This algorithm allows to calculate the sequence of node decisions given a prediction. Theoretical proofs and multiple experiments have been carried out to demonstrate the performance of our method which is not only a local explicability model for the GBDT algorithm but also a unique option that reflects GBDTs internal behavior. The proposal is aligned to the contribution of characteristics having impact in some artificial intelligence problems such as ethical analysis of Artificial Intelligence (AI) and comply with the new European laws such as the General Data Protection Regulation (GDPR) about the right to explain and nondiscrimination.

Implementing local-explainability in Gradient Boosting Trees: Feature Contribution

TL;DR

. Empirically, the method yields explanations that reflect the internal tree decisions, remains stable under feature correlation and noise, and exhibits comparable behavior to SHAP with better sensitivity to outliers, while Lime diverges due to linearity assumptions. The approach provides a practical, intrinsic, local explainability mechanism for GBDT that supports GDPR-like explainability needs and offers a pathway to extending similar explanations to other tree-based models.

Abstract

Paper Structure (21 sections, 17 equations, 9 figures, 1 table, 1 algorithm)

This paper contains 21 sections, 17 equations, 9 figures, 1 table, 1 algorithm.

Introduction
Methodology
Background and motivation
Mathematical Proof
Reinterpreting Gradient Boosting Decision Trees
Implementation
Decision Space
Feature Contribution
Experiments and results
Datasets
Diabetes dataset
Concrete dataset
Experimental Setup
Model Consistency Test
Behaviour under correlation
...and 6 more sections

Figures (9)

Figure 1: Example of a node split in the tree structure. Every node has an average value of the target and a decision based on a certain feature and threshold that splits the feature space into two son nodes.
Figure 2: Contribution of the new feature, correlated, vs contribution of the original base feature, BMI. Only showing features which contribute to the final prediction.
Figure 3: Contribution of the new feature, correlated, vs contribution of the original base feature, age. Only showing features which contribute to the final prediction.
Figure 4: Feature contribution representation under different noise levels (100%, 200%, 300% and 400%) induced to BMI. As BMI looses influence on prediction the rest of the variables contribute more.
Figure 5: Feature contribution representation under different noise levels (100%, 200%, 300% and 400%) induced to age. As age looses influence on prediction the rest of the variables contribute more.
...and 4 more figures

Implementing local-explainability in Gradient Boosting Trees: Feature Contribution

TL;DR

Abstract

Implementing local-explainability in Gradient Boosting Trees: Feature Contribution

Authors

TL;DR

Abstract

Table of Contents

Figures (9)