Group Shapley with Robust Significance Testing and Its Application to Bond Recovery Rate Prediction
Jingyi Wang, Ying Chen, Paolo Giudici
TL;DR
This work extends Shapley-based explanations to feature groups, addressing interpretability in high-dimensional economic data by introducing Group Shapley values and a Tree Group SHAP computation. It develops a robust significance-testing framework based on a three-cumulant chi-square approximation, enabling reliable inference for group-level contributions under non-normal, sparse, and small-sample conditions. The approach is validated through comprehensive simulations showing strong size control and improved power, and is demonstrated on bond recovery rate prediction where the market-related group emerges as the most influential, with Lorenz and Gini analyses indicating more equitable attribution than with individual Shapley values. The methodology offers a practical, statistically principled tool for explainable AI in finance and can be generalized to other domains requiring interpretable, group-level feature contributions.
Abstract
We propose Group Shapley, a metric that extends the classical individual-level Shapley value framework to evaluate the importance of feature groups, addressing the structured nature of predictors commonly found in business and economic data. More importantly, we develop a significance testing procedure based on a three-cumulant chi-square approximation and establish the asymptotic properties of the test statistics for Group Shapley values. Our approach can effectively handle challenging scenarios, including sparse or skewed distributions and small sample sizes, outperforming alternative tests such as the Wald test. Simulations confirm that the proposed test maintains robust empirical size and demonstrates enhanced power under diverse conditions. To illustrate the method's practical relevance in advancing Explainable AI, we apply our framework to bond recovery rate predictions using a global dataset (1996-2023) comprising 2,094 observations and 98 features, grouped into 16 subgroups and five broader categories: bond characteristics, firm fundamentals, industry-specific factors, market-related variables, and macroeconomic indicators. Our results identify the market-related variables group as the most influential. Furthermore, Lorenz curves and Gini indices reveal that Group Shapley assigns feature importance more equitably compared to individual Shapley values.
