Table of Contents
Fetching ...

Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity Analysis

Daniel Csillag, Claudio José Struchiner, Guilherme Tegoni Goedert

TL;DR

Given finite samples, the paper addresses generalization in causal regression by deriving a change-of-measure bound based on the Pearson $\chi^2$ divergence that ties the unobservable complete causal loss to an observable, reweighted loss plus a gap term $\Delta_{T=a}$ and a variance term. The results cover outcome regression and causal meta-learners (T-, S-, X-learners) and extend to losses beyond MSE, such as MAE and quantile loss, enabling estimation of robust and quantile treatment effects under weak ignorability and positivity assumptions. A practical, empirical upper bound on $\Delta_{T=a}$ uses a propensity-model Brier score, enabling data-driven model selection and sensitivity analysis in semi-synthetic and real datasets (e.g., Parkinson's dataset). Experiments show the bounds are remarkably tight, often beating prior bounds by orders of magnitude and guiding algorithm design and model selection in causal inference.

Abstract

Many algorithms have been recently proposed for causal machine learning. Yet, there is little to no theory on their quality, especially considering finite samples. In this work, we propose a theory based on generalization bounds that provides such guarantees. By introducing a novel change-of-measure inequality, we are able to tightly bound the model loss in terms of the deviation of the treatment propensities over the population, which we show can be empirically limited. Our theory is fully rigorous and holds even in the face of hidden confounding and violations of positivity. We demonstrate our bounds on semi-synthetic and real data, showcasing their remarkable tightness and practical utility.

Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity Analysis

TL;DR

Given finite samples, the paper addresses generalization in causal regression by deriving a change-of-measure bound based on the Pearson divergence that ties the unobservable complete causal loss to an observable, reweighted loss plus a gap term and a variance term. The results cover outcome regression and causal meta-learners (T-, S-, X-learners) and extend to losses beyond MSE, such as MAE and quantile loss, enabling estimation of robust and quantile treatment effects under weak ignorability and positivity assumptions. A practical, empirical upper bound on uses a propensity-model Brier score, enabling data-driven model selection and sensitivity analysis in semi-synthetic and real datasets (e.g., Parkinson's dataset). Experiments show the bounds are remarkably tight, often beating prior bounds by orders of magnitude and guiding algorithm design and model selection in causal inference.

Abstract

Many algorithms have been recently proposed for causal machine learning. Yet, there is little to no theory on their quality, especially considering finite samples. In this work, we propose a theory based on generalization bounds that provides such guarantees. By introducing a novel change-of-measure inequality, we are able to tightly bound the model loss in terms of the deviation of the treatment propensities over the population, which we show can be empirically limited. Our theory is fully rigorous and holds even in the face of hidden confounding and violations of positivity. We demonstrate our bounds on semi-synthetic and real data, showcasing their remarkable tightness and practical utility.
Paper Structure (26 sections, 22 theorems, 96 equations, 7 figures)

This paper contains 26 sections, 22 theorems, 96 equations, 7 figures.

Key Result

Theorem 1.1

For any (decomposable) loss function and any $\lambda > 0$, where $\Delta$ is a term that quantifies how far we deviate from a randomized control trial.

Figures (7)

  • Figure 1: Tightness of our bounds. Comparison between our bounds and those of prior-work, both for the complete loss of the estimation of the potential outcome $Y^1$. Additional images for other tasks (e.g., estimation of treatment effects) are available in Appendix \ref{['sec:more-figures']}. Our "theoretic" and "empirical" bounds correspond in Theorems \ref{['thm:upper-bound-main-theoric-outcome']} and \ref{['thm:upper-bound-main-empiric-outcome']}, and "prior work" refers to Corollary 1 of prior-work. Our theoretic bound is quite tight, being very close to the complete loss (which is unobservable in practice). Our empirical bound, while somewhat looser than the theoretic bound, is still substantially tighter than the available prior work.
  • Figure 2: Importance of the tuning parameter $\lambda$ in Lemma \ref{['thm:change-of-measure']}. An illustration of the bound in Lemma \ref{['thm:change-of-measure']} (shaded in green) over different values of its tuning parameter $\lambda$. Change-of-measure inequalities (e.g., novel-change-of-measure) typically do not have a tuning parameter, which corresponds to taking $\lambda = 1$ in our lemma. As can be seen in the figure, being able to optimally select $\lambda$ substantially tightens our bounds.
  • Figure 3: Application: model selection on real data. The plot compares multiple models for treatment effect estimation: bars correspond to our bounds on the complete loss of the models, while the knobs in the middle correspond to standard bootstrapped confidence intervals for the loss on the observed distributions. Note how some models (e.g., R.F. T-learner) appear strictly better than others (e.g., G.B. X-learner) only if ours bounds are not considered. The R.F. and G.B. T/S-learners remain strictly better than the Lasso-based models.
  • Figure 4: Alternate version of Figure \ref{['fig:figure-1']} for outcome regression of $Y^1$ and including more losses.
  • Figure 5: Alternate version of Figure \ref{['fig:figure-1']} for T-learners and including more losses.
  • ...and 2 more figures

Theorems & Definitions (35)

  • Theorem 1.1: Informal
  • Theorem 1.2: Informal
  • Definition 2.1: Pearson's $\chi^2$ divergence
  • Lemma 2.2
  • Theorem 2.3: Upper bound on outcome regression loss in expectation
  • Theorem 2.4: Empirical upper bound on outcome regression loss in expectation
  • Corollary 2.5: PAC empirical upper bound on outcome regression loss
  • Proposition 2.7: Upper bound on T-/S-learner loss in expectation
  • Corollary 2.8: PAC empirical upper bound on the loss of a T-/S-learner
  • Proposition 2.9: Upper bound on X-learner loss in expectation
  • ...and 25 more