Multivariate Bernoulli Hoeffding Decomposition: From Theory to Sensitivity Analysis
Baptiste Ferrere, Nicolas Bousquet, Fabrice Gamboa, Jean-Michel Loubes, Joseph Muré
TL;DR
The paper develops a complete Generalized Hoeffding Decomposition (GHD) for multivariate Bernoulli inputs, showing that MBHD yields one-dimensional, explicitly computable subspaces via the basis $e_A(X_A)$ and a Gram system $\Gamma$, enabling exact sensitivity analysis under dependence. It provides constructive representations, including an oblique dual framework, and derives generalized Sobol' indices and Shapley effects that remain valid when inputs are correlated. Through numerical experiments on synthetic perceptron-like models, a two-dimensional dependence case, and a Mushrooms classifier, the work demonstrates faithful variance attribution and interpretable decompositions despite input dependencies. The results pave the way for interpretable, reverse-engineerable models with binary features and open avenues for scalable high-dimensional extensions and extensions to non-binary finite-support inputs.
Abstract
Understanding the behavior of predictive models with random inputs can be achieved through functional decompositions into sub-models that capture interpretable effects of input groups. Building on recent advances in uncertainty quantification, the existence and uniqueness of a generalized Hoeffding decomposition have been established for correlated input variables, using oblique projections onto suitable functional subspaces. This work focuses on the case of Bernoulli inputs and provides a complete analytical characterization of the decomposition. We show that, in this discrete setting, the associated subspaces are one-dimensional and that the decomposition admits a closed-form representation. One of the main contributions of this study is to generalize the classical Fourier--Walsh--Hadamard decomposition for pseudo-Boolean functions to the correlated case, yielding an oblique version when the underlying distribution is not a product measure, and recovering the standard orthogonal form when independence holds. This explicit structure offers a fully interpretable framework, clarifying the contribution of each input combination and theoretically enabling model reverse engineering. From this formulation, explicit sensitivity measures-such as Sobol' indices and Shapley effects-can be directly derived. Numerical experiments illustrate the practical interest of the approach for decision-support problems involving binary features. The paper concludes with perspectives on extending the methodology to high-dimensional settings and to models involving inputs with finite, non-binary support.
