From interpretability to inference: an estimation framework for universal approximators
Andreas Joseph
TL;DR
The paper bridges interpretability and econometric inference for universal function approximators by decomposing predictions into Shapley values and analyzing their bias and variance. By extending to Shapley-Taylor interactions and coupling with cross-fitting and training-bootstrap, it derives asymptotic unbiasedness of Shapley components and introduces Shapley regressions to test whether components reflect the true DGP, with coefficients that are 0 or 1 in the limit. The methodology is demonstrated on both simulated heterogeneous treatment effects and a real Bank of England information-treatment experiment, showing how to identify true treatment channels such as age and quantify uncertainty. This framework provides a practical, model-agnostic path from interpretability to statistically valid inference for modern ML models, while highlighting local rather than global validity and suggesting avenues for further extensions.
Abstract
We present a novel framework for estimation and inference with the broad class of universal approximators. Estimation is based on the decomposition of model predictions into Shapley values. Inference relies on analyzing the bias and variance properties of individual Shapley components. We show that Shapley value estimation is asymptotically unbiased, and we introduce Shapley regressions as a tool to uncover the true data generating process from noisy data alone. The well-known case of the linear regression is the special case in our framework if the model is linear in parameters. We present theoretical, numerical, and empirical results for the estimation of heterogeneous treatment effects as our guiding example.
