Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity Analysis

Daniel Csillag; Claudio José Struchiner; Guilherme Tegoni Goedert

Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity Analysis

Daniel Csillag, Claudio José Struchiner, Guilherme Tegoni Goedert

TL;DR

Given finite samples, the paper addresses generalization in causal regression by deriving a change-of-measure bound based on the Pearson $\chi^2$ divergence that ties the unobservable complete causal loss to an observable, reweighted loss plus a gap term $\Delta_{T=a}$ and a variance term. The results cover outcome regression and causal meta-learners (T-, S-, X-learners) and extend to losses beyond MSE, such as MAE and quantile loss, enabling estimation of robust and quantile treatment effects under weak ignorability and positivity assumptions. A practical, empirical upper bound on $\Delta_{T=a}$ uses a propensity-model Brier score, enabling data-driven model selection and sensitivity analysis in semi-synthetic and real datasets (e.g., Parkinson's dataset). Experiments show the bounds are remarkably tight, often beating prior bounds by orders of magnitude and guiding algorithm design and model selection in causal inference.

Abstract

Many algorithms have been recently proposed for causal machine learning. Yet, there is little to no theory on their quality, especially considering finite samples. In this work, we propose a theory based on generalization bounds that provides such guarantees. By introducing a novel change-of-measure inequality, we are able to tightly bound the model loss in terms of the deviation of the treatment propensities over the population, which we show can be empirically limited. Our theory is fully rigorous and holds even in the face of hidden confounding and violations of positivity. We demonstrate our bounds on semi-synthetic and real data, showcasing their remarkable tightness and practical utility.

Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity Analysis

TL;DR

Given finite samples, the paper addresses generalization in causal regression by deriving a change-of-measure bound based on the Pearson

divergence that ties the unobservable complete causal loss to an observable, reweighted loss plus a gap term

and a variance term. The results cover outcome regression and causal meta-learners (T-, S-, X-learners) and extend to losses beyond MSE, such as MAE and quantile loss, enabling estimation of robust and quantile treatment effects under weak ignorability and positivity assumptions. A practical, empirical upper bound on

uses a propensity-model Brier score, enabling data-driven model selection and sensitivity analysis in semi-synthetic and real datasets (e.g., Parkinson's dataset). Experiments show the bounds are remarkably tight, often beating prior bounds by orders of magnitude and guiding algorithm design and model selection in causal inference.

Abstract

Paper Structure (26 sections, 22 theorems, 96 equations, 7 figures)

This paper contains 26 sections, 22 theorems, 96 equations, 7 figures.

Introduction
Related work
Novel Bounds for Causal Regression
Outcome regression
Causal Meta-learners
T-learners and S-learners
X-learners
Beyond the Mean Squared Loss
Experiments and Applications
Experiments on Semi-Synthetic Data
Application on real data
Conclusion
Theoretical Results
Change of Measure
Bounds in Expectation
...and 11 more sections

Key Result

Theorem 1.1

For any (decomposable) loss function and any $\lambda > 0$, where $\Delta$ is a term that quantifies how far we deviate from a randomized control trial.

Figures (7)

Figure 1: Tightness of our bounds. Comparison between our bounds and those of prior-work, both for the complete loss of the estimation of the potential outcome $Y^1$. Additional images for other tasks (e.g., estimation of treatment effects) are available in Appendix \ref{['sec:more-figures']}. Our "theoretic" and "empirical" bounds correspond in Theorems \ref{['thm:upper-bound-main-theoric-outcome']} and \ref{['thm:upper-bound-main-empiric-outcome']}, and "prior work" refers to Corollary 1 of prior-work. Our theoretic bound is quite tight, being very close to the complete loss (which is unobservable in practice). Our empirical bound, while somewhat looser than the theoretic bound, is still substantially tighter than the available prior work.
Figure 2: Importance of the tuning parameter $\lambda$ in Lemma \ref{['thm:change-of-measure']}. An illustration of the bound in Lemma \ref{['thm:change-of-measure']} (shaded in green) over different values of its tuning parameter $\lambda$. Change-of-measure inequalities (e.g., novel-change-of-measure) typically do not have a tuning parameter, which corresponds to taking $\lambda = 1$ in our lemma. As can be seen in the figure, being able to optimally select $\lambda$ substantially tightens our bounds.
Figure 3: Application: model selection on real data. The plot compares multiple models for treatment effect estimation: bars correspond to our bounds on the complete loss of the models, while the knobs in the middle correspond to standard bootstrapped confidence intervals for the loss on the observed distributions. Note how some models (e.g., R.F. T-learner) appear strictly better than others (e.g., G.B. X-learner) only if ours bounds are not considered. The R.F. and G.B. T/S-learners remain strictly better than the Lasso-based models.
Figure 4: Alternate version of Figure \ref{['fig:figure-1']} for outcome regression of $Y^1$ and including more losses.
Figure 5: Alternate version of Figure \ref{['fig:figure-1']} for T-learners and including more losses.
...and 2 more figures

Theorems & Definitions (35)

Theorem 1.1: Informal
Theorem 1.2: Informal
Definition 2.1: Pearson's $\chi^2$ divergence
Lemma 2.2
Theorem 2.3: Upper bound on outcome regression loss in expectation
Theorem 2.4: Empirical upper bound on outcome regression loss in expectation
Corollary 2.5: PAC empirical upper bound on outcome regression loss
Proposition 2.7: Upper bound on T-/S-learner loss in expectation
Corollary 2.8: PAC empirical upper bound on the loss of a T-/S-learner
Proposition 2.9: Upper bound on X-learner loss in expectation
...and 25 more

Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity Analysis

TL;DR

Abstract

Generalization Bounds for Causal Regression: Insights, Guarantees and Sensitivity Analysis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (35)