Untangling Sample and Population Level Estimands in Bayesian Causal Inference
Arman Oganisian
Abstract
Model-based Bayesian inference for causal estimands has been growing in popularity, however many misconceptions and implementation errors arise from conflating sample and population-level estimands. Our goal is to elucidate the crucial differences between sample and population-level inference across identification, modeling, computation, and interpretation. For example, common sample-level estimands require cross-world Bayesian modeling, whereas many (but not all) population-level estimands do not. Similarly, the former requires explicit imputation of counterfactuals from their joint posterior, whereas the latter typically only requires a posterior distribution over parameters and perhaps post-hoc Monte Carlo simulation. We provide a total of four examples with a particular emphasis on cross-world assumptions and Bayesian nonparametric methods. Because the differences are conceptually subtle but can be practically substantial, each example is discussed in detail with implementation code in Stan. We also provide a detailed discussion of common errors when implementing the Bayesian g-formula. The overarching message here is to always engage in first-principles thinking about which marginal of the joint posterior is of interest in a particular causal analysis, then follow the strict logic of Bayes' theorem and probability to avoid common implementation errors.
