Robust Semiparametric Inference for Bayesian Additive Regression Trees

Christoph Breunig; Ruixuan Liu; Zhengfei Yu

Robust Semiparametric Inference for Bayesian Additive Regression Trees

Christoph Breunig, Ruixuan Liu, Zhengfei Yu

TL;DR

This work addresses valid inference on the mean outcome $\chi_0=\mathbb{E}_0[Y_i]$ under MAR missing data using Bayesian Additive Regression Trees (BART) with Bayesian bootstrap. The authors propose RoBART, a posterior bias-correction that combines pilot propensity-score estimators and a debiasing term to enable semiparametric Bernstein-von Mises limits without requiring the Donsker property. A key theoretical contribution is proving $d_{BL}(\mathcal{L}_{\Pi}(\sqrt{n}(\chi_\eta-\widehat{\chi}-\widehat{b}_{\eta})|Z^{(n)}), N(0,v_0))\to 0$, where $v_0=\mathbb{E}_0[\widetilde{\chi}_0^2(Z)]$, after accounting for bias $b_{0,\eta}$ (or through bias correction), showing asymptotic normality and semiparametric efficiency. Empirical studies, including Monte Carlo simulations and NHANES data, demonstrate reduced bias and improved coverage for RoBART relative to standard BART and one-step corrections. The approach blends nonparametric Bayesian forest priors with semiparametric efficiency concepts and offers a practical toolkit for valid inference in complex, high-dimensional settings.

Abstract

We develop a semiparametric framework for inference on the mean response in missing-data settings using a corrected posterior distribution. Our approach is tailored to Bayesian Additive Regression Trees (BART), which is a powerful predictive method but whose nonsmoothness complicate asymptotic theory with multi-dimensional covariates. When using BART combined with Bayesian bootstrap weights, we establish a new Bernstein-von Mises theorem and show that the limit distribution generally contains a bias term. To address this, we introduce RoBART, a posterior bias-correction that robustifies BART for valid inference on the mean response. Monte Carlo studies support our theory, demonstrating reduced bias and improved coverage relative to existing procedures using BART.

Robust Semiparametric Inference for Bayesian Additive Regression Trees

TL;DR

This work addresses valid inference on the mean outcome

under MAR missing data using Bayesian Additive Regression Trees (BART) with Bayesian bootstrap. The authors propose RoBART, a posterior bias-correction that combines pilot propensity-score estimators and a debiasing term to enable semiparametric Bernstein-von Mises limits without requiring the Donsker property. A key theoretical contribution is proving

, where

, after accounting for bias

(or through bias correction), showing asymptotic normality and semiparametric efficiency. Empirical studies, including Monte Carlo simulations and NHANES data, demonstrate reduced bias and improved coverage for RoBART relative to standard BART and one-step corrections. The approach blends nonparametric Bayesian forest priors with semiparametric efficiency concepts and offers a practical toolkit for valid inference in complex, high-dimensional settings.

Robust Semiparametric Inference for Bayesian Additive Regression Trees

TL;DR

Abstract

Robust Semiparametric Inference for Bayesian Additive Regression Trees

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Theorems & Definitions (22)