Generalized Bayes for Causal Inference

Emil Javurek; Dennis Frauen; Yuxin Wang; Stefan Feuerriegel

Generalized Bayes for Causal Inference

Emil Javurek, Dennis Frauen, Yuxin Wang, Stefan Feuerriegel

TL;DR

This paper proposes a generalized Bayesian framework for causal inference that turns existing loss-based causal estimators into estimators with full uncertainty quantification, and is the first flexible framework for constructing generalized Bayesian posteriors for causal machine learning.

Abstract

Uncertainty quantification is central to many applications of causal machine learning, yet principled Bayesian inference for causal effects remains challenging. Standard Bayesian approaches typically require specifying a probabilistic model for the data-generating process, including high-dimensional nuisance components such as propensity scores and outcome regressions. Standard posteriors are thus vulnerable to strong modeling choices, including complex prior elicitation. In this paper, we propose a generalized Bayesian framework for causal inference. Our framework avoids explicit likelihood modeling; instead, we place priors directly on the causal estimands and update these using an identification-driven loss function, which yields generalized posteriors for causal effects. As a result, our framework turns existing loss-based causal estimators into estimators with full uncertainty quantification. Our framework is flexible and applicable to a broad range of causal estimands (e.g., ATE, CATE). Further, our framework can be applied on top of state-of-the-art causal machine learning pipelines (e.g., Neyman-orthogonal meta-learners). For Neyman-orthogonal losses, we show that the generalized posteriors converge to their oracle counterparts and remain robust to first-stage nuisance estimation error. With calibration, we thus obtain valid frequentist uncertainty even when nuisance estimators converge at slower-than-parametric rates. Empirically, we demonstrate that our proposed framework offers causal effect estimation with calibrated uncertainty across several causal inference settings. To the best of our knowledge, this is the first flexible framework for constructing generalized Bayesian posteriors for causal machine learning.

Generalized Bayes for Causal Inference

TL;DR

Abstract

Paper Structure (29 sections, 4 theorems, 62 equations, 3 figures, 4 tables, 1 algorithm)

This paper contains 29 sections, 4 theorems, 62 equations, 3 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Problem Setup
Why standard Bayesian approaches for causal inference are difficult
Method (Generalized Bayes)
Gibbs posteriors
Feasible posteriors
Algorithm
Theoretical results
Experiments
Proofs
Finite dimensional theta
Step 1 (Local asymptotic normality/BvM approximations).
Step 2 (Orthogonality implies $\hat{\theta}_{\mathrm{fe}}-\hat{\theta}_{\mathrm{or}}=O_{\mathbb{P}_{\mathrm{obs}}}(r_n^2)$).
Step 3 (Reduce the posterior TV distance to TV between two Gaussians).
...and 14 more sections

Key Result

Theorem 5.1

Assume that: (i) the oracle Gibbs posterior satisfies a Bernstein--von Mises (BvM) approximation; (ii) the loss is Neyman orthogonal at $(\theta^\star,\eta^S_0)$ in the sense of Eq. def:orth-loss; and (iii) $\hat{\eta}^S$ is obtained by sample splitting/cross-fitting and satisfies $\|\hat{\eta}^S-\e where $\mathrm{TV}$ denotes total variation on the target of inference: either (a) the full paramet

Figures (3)

Figure 1: Overview of the pipeline for generalized Bayesian inference: In the standard causal ML pipeline, one observes the data $\mathcal{D}_n$, chooses an identification-driven loss, estimates the corresponding nuisances, and thus constructs a fitting objective. Here, we further elicit a prior on the causal effect and together construct a generalized updating rule yielding the generalized posterior distribution for the causal effect of interest. This approach can be calibrated to yield frequentist-valid uncertainty and be made robust to nuisance estimation error via Neyman-orthogonal losses.
Figure 2: CrI length for back-door ATE. Box plots of the lengths of the $95\%$$\mathrm{CrI}^{(r)}_{0.95}$ credible interval across $R=50$ repetitions for each strategy $S \in \{\mathrm{AIPW,IPW,RA}\}$, the dataset $\mathcal{D}_1$, and a varying sample size $n \in \{100,\ldots,1000\}$. Unfaithful credible intervals are reported with a cross at their median length.
Figure 3: (back-door CATE) An example of a Gaussian Process fit (mean and $95\%$ CrI) of the generalized posterior of CATE, with the $\mathrm{DR}$ loss, in $\mathcal{D}_4$, at $n=1000$. The solution fitted with variational inference (VI) uses an Inducing Point GP. For implementation details, see Appendix \ref{['app:implementation']}.

Theorems & Definitions (10)

Theorem 5.1: Posterior stability under orthogonal losses (informal)
proof
Remark 5.2: Why this fails without orthogonality
Theorem 1.1: Posterior stability under orthogonal losses (finite-dimensional $\theta$)
proof : Proof of Theorem \ref{['thm:posterior-stability']} (finite-dimensional $\theta$)
Theorem 1.2: Posterior stability under orthogonal losses for infinite-dimensional $\theta$ (finite-dimensional projections)
proof
Remark 1.3: CATE as a special case
Lemma 1.4: Sufficient conditions for the feasible Gibbs BvM
proof

Generalized Bayes for Causal Inference

TL;DR

Abstract

Generalized Bayes for Causal Inference

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (10)