Fast Calculation of Feature Contributions in Boosting Trees
Zhongli Jiang, Min Zhang, Dabao Zhang
TL;DR
The paper tackles global interpretability of tree ensembles by decomposing the explained variance via Shapley values of $R^2$ and introduces Q-SHAP, a polynomial-time algorithm for Shapley values of quadratic losses. It develops a single-tree core and then extends to boosting ensembles, using leaf-based reweighting and polynomial identities to compute feature contributions efficiently. Across simulations and real data, Q-SHAP achieves accurate, stable feature-specific $R^2$ estimates and dramatically outperforms SAGE and SPVIM in runtime. This work provides a scalable framework for global feature attribution in high-dimensional tree models and can generalize to broader quadratic loss functions.
Abstract
Recently, several fast algorithms have been proposed to decompose predicted value into Shapley values, enabling individualized feature contribution analysis in tree models. While such local decomposition offers valuable insights, it underscores the need for a global evaluation of feature contributions. Although coefficients of determination ($R^2$) allow for comparative assessment of individual features, individualizing $R^2$ is challenged by the underlying quadratic losses. To address this, we propose Q-SHAP, an efficient algorithm that reduces the computational complexity of calculating Shapley values for quadratic losses to polynomial time. Our simulations show that Q-SHAP not only improves computational efficiency but also enhances the accuracy of feature-specific $R^2$ estimates.
