Table of Contents
Fetching ...

Robust Semiparametric Inference for Bayesian Additive Regression Trees

Christoph Breunig, Ruixuan Liu, Zhengfei Yu

TL;DR

This work addresses valid inference on the mean outcome $\chi_0=\mathbb{E}_0[Y_i]$ under MAR missing data using Bayesian Additive Regression Trees (BART) with Bayesian bootstrap. The authors propose RoBART, a posterior bias-correction that combines pilot propensity-score estimators and a debiasing term to enable semiparametric Bernstein-von Mises limits without requiring the Donsker property. A key theoretical contribution is proving $d_{BL}(\mathcal{L}_{\Pi}(\sqrt{n}(\chi_\eta-\widehat{\chi}-\widehat{b}_{\eta})|Z^{(n)}), N(0,v_0))\to 0$, where $v_0=\mathbb{E}_0[\widetilde{\chi}_0^2(Z)]$, after accounting for bias $b_{0,\eta}$ (or through bias correction), showing asymptotic normality and semiparametric efficiency. Empirical studies, including Monte Carlo simulations and NHANES data, demonstrate reduced bias and improved coverage for RoBART relative to standard BART and one-step corrections. The approach blends nonparametric Bayesian forest priors with semiparametric efficiency concepts and offers a practical toolkit for valid inference in complex, high-dimensional settings.

Abstract

We develop a semiparametric framework for inference on the mean response in missing-data settings using a corrected posterior distribution. Our approach is tailored to Bayesian Additive Regression Trees (BART), which is a powerful predictive method but whose nonsmoothness complicate asymptotic theory with multi-dimensional covariates. When using BART combined with Bayesian bootstrap weights, we establish a new Bernstein-von Mises theorem and show that the limit distribution generally contains a bias term. To address this, we introduce RoBART, a posterior bias-correction that robustifies BART for valid inference on the mean response. Monte Carlo studies support our theory, demonstrating reduced bias and improved coverage relative to existing procedures using BART.

Robust Semiparametric Inference for Bayesian Additive Regression Trees

TL;DR

This work addresses valid inference on the mean outcome under MAR missing data using Bayesian Additive Regression Trees (BART) with Bayesian bootstrap. The authors propose RoBART, a posterior bias-correction that combines pilot propensity-score estimators and a debiasing term to enable semiparametric Bernstein-von Mises limits without requiring the Donsker property. A key theoretical contribution is proving , where , after accounting for bias (or through bias correction), showing asymptotic normality and semiparametric efficiency. Empirical studies, including Monte Carlo simulations and NHANES data, demonstrate reduced bias and improved coverage for RoBART relative to standard BART and one-step corrections. The approach blends nonparametric Bayesian forest priors with semiparametric efficiency concepts and offers a practical toolkit for valid inference in complex, high-dimensional settings.

Abstract

We develop a semiparametric framework for inference on the mean response in missing-data settings using a corrected posterior distribution. Our approach is tailored to Bayesian Additive Regression Trees (BART), which is a powerful predictive method but whose nonsmoothness complicate asymptotic theory with multi-dimensional covariates. When using BART combined with Bayesian bootstrap weights, we establish a new Bernstein-von Mises theorem and show that the limit distribution generally contains a bias term. To address this, we introduce RoBART, a posterior bias-correction that robustifies BART for valid inference on the mean response. Monte Carlo studies support our theory, demonstrating reduced bias and improved coverage relative to existing procedures using BART.

Paper Structure

This paper contains 16 sections, 9 theorems, 91 equations, 2 tables, 3 algorithms.

Key Result

Theorem 3.1

Let Assumptions Assump:ID, Assump:Rate and Assump:Donsker (i) hold. Then, with the one-step posterior correction, we have where $b_{0,\eta}= \mathbb{P}_n[(\gamma_0-1)(m_0-m_{\eta})]$.

Theorems & Definitions (22)

  • Remark 2.1: Estimation of the Riesz Representer
  • Theorem 3.1
  • Remark 3.1: Bias Equivalence for Prior Correction
  • Theorem 3.2
  • Definition 4.1
  • Remark 4.1: Discussion of Assumptions
  • Theorem 4.1
  • Remark 4.2: Donsker Class
  • Remark 4.3: Smoothed BART
  • proof : Proof of Theorem \ref{['thm:BvM']}
  • ...and 12 more