Table of Contents
Fetching ...

Group Shapley with Robust Significance Testing and Its Application to Bond Recovery Rate Prediction

Jingyi Wang, Ying Chen, Paolo Giudici

TL;DR

This work extends Shapley-based explanations to feature groups, addressing interpretability in high-dimensional economic data by introducing Group Shapley values and a Tree Group SHAP computation. It develops a robust significance-testing framework based on a three-cumulant chi-square approximation, enabling reliable inference for group-level contributions under non-normal, sparse, and small-sample conditions. The approach is validated through comprehensive simulations showing strong size control and improved power, and is demonstrated on bond recovery rate prediction where the market-related group emerges as the most influential, with Lorenz and Gini analyses indicating more equitable attribution than with individual Shapley values. The methodology offers a practical, statistically principled tool for explainable AI in finance and can be generalized to other domains requiring interpretable, group-level feature contributions.

Abstract

We propose Group Shapley, a metric that extends the classical individual-level Shapley value framework to evaluate the importance of feature groups, addressing the structured nature of predictors commonly found in business and economic data. More importantly, we develop a significance testing procedure based on a three-cumulant chi-square approximation and establish the asymptotic properties of the test statistics for Group Shapley values. Our approach can effectively handle challenging scenarios, including sparse or skewed distributions and small sample sizes, outperforming alternative tests such as the Wald test. Simulations confirm that the proposed test maintains robust empirical size and demonstrates enhanced power under diverse conditions. To illustrate the method's practical relevance in advancing Explainable AI, we apply our framework to bond recovery rate predictions using a global dataset (1996-2023) comprising 2,094 observations and 98 features, grouped into 16 subgroups and five broader categories: bond characteristics, firm fundamentals, industry-specific factors, market-related variables, and macroeconomic indicators. Our results identify the market-related variables group as the most influential. Furthermore, Lorenz curves and Gini indices reveal that Group Shapley assigns feature importance more equitably compared to individual Shapley values.

Group Shapley with Robust Significance Testing and Its Application to Bond Recovery Rate Prediction

TL;DR

This work extends Shapley-based explanations to feature groups, addressing interpretability in high-dimensional economic data by introducing Group Shapley values and a Tree Group SHAP computation. It develops a robust significance-testing framework based on a three-cumulant chi-square approximation, enabling reliable inference for group-level contributions under non-normal, sparse, and small-sample conditions. The approach is validated through comprehensive simulations showing strong size control and improved power, and is demonstrated on bond recovery rate prediction where the market-related group emerges as the most influential, with Lorenz and Gini analyses indicating more equitable attribution than with individual Shapley values. The methodology offers a practical, statistically principled tool for explainable AI in finance and can be generalized to other domains requiring interpretable, group-level feature contributions.

Abstract

We propose Group Shapley, a metric that extends the classical individual-level Shapley value framework to evaluate the importance of feature groups, addressing the structured nature of predictors commonly found in business and economic data. More importantly, we develop a significance testing procedure based on a three-cumulant chi-square approximation and establish the asymptotic properties of the test statistics for Group Shapley values. Our approach can effectively handle challenging scenarios, including sparse or skewed distributions and small sample sizes, outperforming alternative tests such as the Wald test. Simulations confirm that the proposed test maintains robust empirical size and demonstrates enhanced power under diverse conditions. To illustrate the method's practical relevance in advancing Explainable AI, we apply our framework to bond recovery rate predictions using a global dataset (1996-2023) comprising 2,094 observations and 98 features, grouped into 16 subgroups and five broader categories: bond characteristics, firm fundamentals, industry-specific factors, market-related variables, and macroeconomic indicators. Our results identify the market-related variables group as the most influential. Furthermore, Lorenz curves and Gini indices reveal that Group Shapley assigns feature importance more equitably compared to individual Shapley values.
Paper Structure (11 sections, 2 theorems, 40 equations, 7 figures, 4 tables)

This paper contains 11 sections, 2 theorems, 40 equations, 7 figures, 4 tables.

Key Result

Theorem 1

Under the null hypothesis (jointtest) and Condition C1, as $S,K\to\infty$,

Figures (7)

  • Figure 1: Recovery rates of UP5 from 1996 to 2023, covering 576 firms and 1638 bonds globally
  • Figure 2: Some explanatory features of UP5, representing (from top to down): (a)bond characteristics, (b)firm fundamentals, (c)industry-specific factors, (d)market related variables and (e)macroeconomic conditions respectively
  • Figure 3: Recovery rates v.s. 12-month exchange-sector aggregated PD and DTD
  • Figure 4: SHAP values for the top 20 individual features, among the 98 available ones.
  • Figure 5: Group SHAP values for 16 subgroups (top) and 5 groups (bottom)
  • ...and 2 more figures

Theorems & Definitions (6)

  • Theorem 1
  • Remark 1
  • Remark 2
  • Lemma 1
  • proof : Proof of Lemma \ref{['lmm:Tlim']}
  • proof : Proof of Theorem \ref{['thm:size']}