Table of Contents
Fetching ...

Defensive Model Expansion for Robust Bayesian Inference

Antonio R. Linero

Abstract

Some applied researchers hesitate to use nonparametric methods, worrying that they will lose power in small samples or overfit the data when simpler models are sufficient. We argue that at least some of these concerns are unfounded when nonparametric models are strongly shrunk toward parametric submodels. We consider expanding a parametric model with a nonparametric component $r(x)$ that is heavily shrunk toward zero. This construction allows the model to adapt automatically: if the parametric model is correct, the nonparametric component disappears, recovering parametric efficiency, while if it is misspecified, the flexible component activates to capture the missing signal. We show that this adaptive behavior follows from simple and general conditions. Specifically, we prove that Bayesian nonparametric models anchored to linear regression, including variants of Gaussian process regression and Bayesian additive regression trees, consistently identify the correct parametric submodel when it holds and give asymptotically efficient inference for regression coefficients. In simulations, we find that the general BART model performs identically to correctly specified linear regression when the parametric model holds, and substantially outperforms it when nonlinear effects are present. This suggests a practical paradigm: defensive model expansion as a safeguard against model misspecification.

Defensive Model Expansion for Robust Bayesian Inference

Abstract

Some applied researchers hesitate to use nonparametric methods, worrying that they will lose power in small samples or overfit the data when simpler models are sufficient. We argue that at least some of these concerns are unfounded when nonparametric models are strongly shrunk toward parametric submodels. We consider expanding a parametric model with a nonparametric component that is heavily shrunk toward zero. This construction allows the model to adapt automatically: if the parametric model is correct, the nonparametric component disappears, recovering parametric efficiency, while if it is misspecified, the flexible component activates to capture the missing signal. We show that this adaptive behavior follows from simple and general conditions. Specifically, we prove that Bayesian nonparametric models anchored to linear regression, including variants of Gaussian process regression and Bayesian additive regression trees, consistently identify the correct parametric submodel when it holds and give asymptotically efficient inference for regression coefficients. In simulations, we find that the general BART model performs identically to correctly specified linear regression when the parametric model holds, and substantially outperforms it when nonlinear effects are present. This suggests a practical paradigm: defensive model expansion as a safeguard against model misspecification.

Paper Structure

This paper contains 36 sections, 8 theorems, 10 equations, 8 figures.

Key Result

Theorem 1

Suppose that PPA holds and that there exist constants $C_1, C_2, N^\star$ such that $\Pi_r(\|r - r_0\|_{\mathcal{H}} \le M_N N^{-1/2}) \ge C_1 e^{-C_2 M_N^2}$ where $M_N / \sqrt{\log N} \to \infty$. Then we have $\mathbb E_{\theta_0} \Pi_\alpha\{H(\theta_0, \theta) > K \, M_N \, N^{-1/2} \mid \bolds

Figures (8)

  • Figure 1: Left: an example of a regression tree with input $x = (x_1, x_2)$ supported on $[0,1]^2$. Right: the induced step function on $[0,1]^2$ .
  • Figure 2: Results for the simulation experiment in Section \ref{['sec:rate-adaptivity-of-gbart']}. From left to right, we have $\sigma_0 = 1, 3, 5$ and from top to bottom we have $\lambda_0 = 0, 0.4$. MSE and $N$ are displayed on the log-scale.
  • Figure 3: Posterior inclusion probabilities for the Gaussian process model under various values of $\lambda_0$. From left to right: $\sigma_0 \in \{1,2,4\}$. From top to bottom: $N \in \{200,400,800\}$.
  • Figure 4: Results for the experiments in Section \ref{['sec:semiparametric-bernstein-von-mises-for-gaussian-processes']}.
  • Figure 5: For $N = 1000$: the posterior summary $R^2$ and overall summary $R^2$ of the linear model (left), posterior distribution of the coefficient of BMI compared with the result from the linear model (middle), and plot of the projected predictions against the actual predictions from the model (right).
  • ...and 3 more figures

Theorems & Definitions (13)

  • Definition 1: $\epsilon_N$-thickness
  • Theorem 1: General Rates
  • Corollary 1: Rate for Spike-and-Slab
  • Theorem 2: General Model Selection
  • Remark 1
  • Theorem 3: Bernstein-von Mises
  • Theorem 4: BART Adaptivity
  • Theorem 5: BART Bernstein-von Mises
  • Remark 2: Gaussian Process Lower Bounds
  • Theorem 6: Semiparametric Bernstein-von Mises Theorem
  • ...and 3 more