Table of Contents
Fetching ...

Multivariate Bernoulli Hoeffding Decomposition: From Theory to Sensitivity Analysis

Baptiste Ferrere, Nicolas Bousquet, Fabrice Gamboa, Jean-Michel Loubes, Joseph Muré

TL;DR

The paper develops a complete Generalized Hoeffding Decomposition (GHD) for multivariate Bernoulli inputs, showing that MBHD yields one-dimensional, explicitly computable subspaces via the basis $e_A(X_A)$ and a Gram system $\Gamma$, enabling exact sensitivity analysis under dependence. It provides constructive representations, including an oblique dual framework, and derives generalized Sobol' indices and Shapley effects that remain valid when inputs are correlated. Through numerical experiments on synthetic perceptron-like models, a two-dimensional dependence case, and a Mushrooms classifier, the work demonstrates faithful variance attribution and interpretable decompositions despite input dependencies. The results pave the way for interpretable, reverse-engineerable models with binary features and open avenues for scalable high-dimensional extensions and extensions to non-binary finite-support inputs.

Abstract

Understanding the behavior of predictive models with random inputs can be achieved through functional decompositions into sub-models that capture interpretable effects of input groups. Building on recent advances in uncertainty quantification, the existence and uniqueness of a generalized Hoeffding decomposition have been established for correlated input variables, using oblique projections onto suitable functional subspaces. This work focuses on the case of Bernoulli inputs and provides a complete analytical characterization of the decomposition. We show that, in this discrete setting, the associated subspaces are one-dimensional and that the decomposition admits a closed-form representation. One of the main contributions of this study is to generalize the classical Fourier--Walsh--Hadamard decomposition for pseudo-Boolean functions to the correlated case, yielding an oblique version when the underlying distribution is not a product measure, and recovering the standard orthogonal form when independence holds. This explicit structure offers a fully interpretable framework, clarifying the contribution of each input combination and theoretically enabling model reverse engineering. From this formulation, explicit sensitivity measures-such as Sobol' indices and Shapley effects-can be directly derived. Numerical experiments illustrate the practical interest of the approach for decision-support problems involving binary features. The paper concludes with perspectives on extending the methodology to high-dimensional settings and to models involving inputs with finite, non-binary support.

Multivariate Bernoulli Hoeffding Decomposition: From Theory to Sensitivity Analysis

TL;DR

The paper develops a complete Generalized Hoeffding Decomposition (GHD) for multivariate Bernoulli inputs, showing that MBHD yields one-dimensional, explicitly computable subspaces via the basis and a Gram system , enabling exact sensitivity analysis under dependence. It provides constructive representations, including an oblique dual framework, and derives generalized Sobol' indices and Shapley effects that remain valid when inputs are correlated. Through numerical experiments on synthetic perceptron-like models, a two-dimensional dependence case, and a Mushrooms classifier, the work demonstrates faithful variance attribution and interpretable decompositions despite input dependencies. The results pave the way for interpretable, reverse-engineerable models with binary features and open avenues for scalable high-dimensional extensions and extensions to non-binary finite-support inputs.

Abstract

Understanding the behavior of predictive models with random inputs can be achieved through functional decompositions into sub-models that capture interpretable effects of input groups. Building on recent advances in uncertainty quantification, the existence and uniqueness of a generalized Hoeffding decomposition have been established for correlated input variables, using oblique projections onto suitable functional subspaces. This work focuses on the case of Bernoulli inputs and provides a complete analytical characterization of the decomposition. We show that, in this discrete setting, the associated subspaces are one-dimensional and that the decomposition admits a closed-form representation. One of the main contributions of this study is to generalize the classical Fourier--Walsh--Hadamard decomposition for pseudo-Boolean functions to the correlated case, yielding an oblique version when the underlying distribution is not a product measure, and recovering the standard orthogonal form when independence holds. This explicit structure offers a fully interpretable framework, clarifying the contribution of each input combination and theoretically enabling model reverse engineering. From this formulation, explicit sensitivity measures-such as Sobol' indices and Shapley effects-can be directly derived. Numerical experiments illustrate the practical interest of the approach for decision-support problems involving binary features. The paper concludes with perspectives on extending the methodology to high-dimensional settings and to models involving inputs with finite, non-binary support.

Paper Structure

This paper contains 26 sections, 10 theorems, 83 equations, 7 figures, 6 tables, 1 algorithm.

Key Result

Theorem 2.1

Let $G : \mathbb{R}^d \rightarrow \mathbb{R}$ be a square integrable function of a random input vector $X = (X_1, \dots, X_d)$, whose components may be arbitrarily dependent and satisfy Assumption (Ass_GHD). Then $G(X)$ can be expressed as a unique finite sum where each component $G_A(X_A)$ is a $\nu_A-$measurable function of $X_A$ satisfying the following hierarchical orthogonality condition Th

Figures (7)

  • Figure 1: Shapley value bar plots for the three study cases.
  • Figure 2: Behavior of Sobol' indices (left) and variance components (right) as functions of the dependence parameter $\rho$. The orange curve corresponds to the marginal contributions of $X_1$ and $X_2$, the blue curve to the joint contribution $X_{1,2}$, and the black dotted curve to the total variance.
  • Figure 3: Sensitivity analysis for mushroom toxicity classification. (a) Generalized Sobol' indices showing dominance of main effects and negligible interactions. (b) Shapley effects confirming the importance hierarchy $X_1 \gg X_2 > \{X_3, X_4, X_5\}$.
  • Figure 4: Example of representation in $\mathbb{R}^{2}$ for $3$ dependence levels.
  • Figure 5: Representation of $G(X)$ in $\mathrm{span}\left( e_1(X_1) , e_2(X_2) \right)$ for $3$ dependence levels.
  • ...and 2 more figures

Theorems & Definitions (25)

  • Theorem 2.1: GHD expression, defined in chastaing_generalized_2012
  • Remark
  • Corollary 2.1
  • Remark
  • Theorem 2.2: Multivariate Bernoulli Hoeffding Decomposition --- MBHD
  • Corollary 2.2: Exclusion property
  • Corollary 2.3: Geometric MBHD
  • Definition 3.1: Generalized Sobol' indices
  • Definition 3.2: Sobol' matrix
  • Definition 3.3: Shapley values via Harsanyi dividends
  • ...and 15 more