Robust Semiparametric Inference for Bayesian Additive Regression Trees
Christoph Breunig, Ruixuan Liu, Zhengfei Yu
TL;DR
This work addresses valid inference on the mean outcome $\chi_0=\mathbb{E}_0[Y_i]$ under MAR missing data using Bayesian Additive Regression Trees (BART) with Bayesian bootstrap. The authors propose RoBART, a posterior bias-correction that combines pilot propensity-score estimators and a debiasing term to enable semiparametric Bernstein-von Mises limits without requiring the Donsker property. A key theoretical contribution is proving $d_{BL}(\mathcal{L}_{\Pi}(\sqrt{n}(\chi_\eta-\widehat{\chi}-\widehat{b}_{\eta})|Z^{(n)}), N(0,v_0))\to 0$, where $v_0=\mathbb{E}_0[\widetilde{\chi}_0^2(Z)]$, after accounting for bias $b_{0,\eta}$ (or through bias correction), showing asymptotic normality and semiparametric efficiency. Empirical studies, including Monte Carlo simulations and NHANES data, demonstrate reduced bias and improved coverage for RoBART relative to standard BART and one-step corrections. The approach blends nonparametric Bayesian forest priors with semiparametric efficiency concepts and offers a practical toolkit for valid inference in complex, high-dimensional settings.
Abstract
We develop a semiparametric framework for inference on the mean response in missing-data settings using a corrected posterior distribution. Our approach is tailored to Bayesian Additive Regression Trees (BART), which is a powerful predictive method but whose nonsmoothness complicate asymptotic theory with multi-dimensional covariates. When using BART combined with Bayesian bootstrap weights, we establish a new Bernstein-von Mises theorem and show that the limit distribution generally contains a bias term. To address this, we introduce RoBART, a posterior bias-correction that robustifies BART for valid inference on the mean response. Monte Carlo studies support our theory, demonstrating reduced bias and improved coverage relative to existing procedures using BART.
