Table of Contents
Fetching ...

Bayesian Model Averaging in Causal Instrumental Variable Models

Gregor Steiner, Mark Steel

Abstract

Instrumental variables are a popular tool to infer causal effects under unobserved confounding, but choosing suitable instruments is challenging in practice. We propose gIVBMA, a Bayesian model averaging procedure that addresses this challenge by averaging across different sets of instrumental variables and covariates in a structural equation model. This allows for data-driven selection of valid and relevant instruments and provides additional robustness against invalid instruments. Our approach extends previous work through a scale-invariant prior structure and accommodates non-Gaussian outcomes and treatments, offering greater flexibility than existing methods. The computational strategy uses conditional Bayes factors to update models separately for the outcome and treatments. We prove that this model selection procedure is consistent. In simulation experiments, gIVBMA outperforms current state-of-the-art methods. We demonstrate its usefulness in two empirical applications: the effects of malaria and institutions on income per capita and the returns to schooling. A software implementation of gIVBMA is available in Julia.

Bayesian Model Averaging in Causal Instrumental Variable Models

Abstract

Instrumental variables are a popular tool to infer causal effects under unobserved confounding, but choosing suitable instruments is challenging in practice. We propose gIVBMA, a Bayesian model averaging procedure that addresses this challenge by averaging across different sets of instrumental variables and covariates in a structural equation model. This allows for data-driven selection of valid and relevant instruments and provides additional robustness against invalid instruments. Our approach extends previous work through a scale-invariant prior structure and accommodates non-Gaussian outcomes and treatments, offering greater flexibility than existing methods. The computational strategy uses conditional Bayes factors to update models separately for the outcome and treatments. We prove that this model selection procedure is consistent. In simulation experiments, gIVBMA outperforms current state-of-the-art methods. We demonstrate its usefulness in two empirical applications: the effects of malaria and institutions on income per capita and the returns to schooling. A software implementation of gIVBMA is available in Julia.

Paper Structure

This paper contains 45 sections, 1 theorem, 78 equations, 8 figures, 12 tables, 1 algorithm.

Key Result

Theorem 1

The procedure gIVBMA, detailed in Sections sec:sampling and sec:prior, is model selection consistent in the Gaussian case if and only if the following conditions are satisfied: If (some of) the components in $(\boldsymbol{y},\boldsymbol{X})$ are assigned a non-Gaussian sampling distribution as in Section sec:nonGaussian, with $(\boldsymbol{q},\boldsymbol{Q})$ the latent Gaussian counterparts, the

Figures (8)

  • Figure 1: Multiple endogenous variables with correlated instruments: Posterior probabilities of the true treatment model and mean model sizes for $100$ simulated datasets of size $n = 50$ (top) and $n=500$ (bottom). The true treatment model size is 5. IVBMA uses separate treatment models for the two endogenous variables $X_1$ and $X_2$.
  • Figure 2: Returns to schooling: Posterior distributions of the treatment effect of education and the covariance ratio $\sigma_{yx} / \sigma_{xx}$ based on the imputed dataset ($n = 3,003$). The algorithm was run for $5,000$ iterations, the first $500$ of which were discarded as burn-in. The vertical line for IVBMA indicates a point mass at zero of $0.962$.
  • Figure S.1: Exponential priors (scale parameterisation) on $\nu$ and the implied prior on the covariance ratio $\sigma_{yx} / \sigma_{xx}$ and the conditional variance $\sigma_{y \mid x}$ in the case of $l = 1$. The Exponential priors are shifted by $l+1 = 2$.
  • Figure S.2: The implied prior on the number of valid and relevant instruments $N_Z$ for our proposed model (gIVBMA) and the one used in karl_instrumental_2012 (IVBMA; assuming $l=1$) for $p = 5, 10, 15, 20$ (from top-left to bottom-right).
  • Figure S.3: Invalid Instruments: A chain of the outcome model size ($L$), treatment model size ($M$), and the number of valid and relevant instruments $N_Z$ implied by outcome and treatment model at each iteration for both considered gIVBMA variants. These results are based on a single simulated dataset of size $n = 50$ (left) and $n = 500$ (right), respectively, with $s = 3$ out of $10$ instruments being invalid.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Theorem 1
  • proof