Augmented balancing weights as linear regression

David Bruns-Smith; Oliver Dukes; Avi Feller; Elizabeth L. Ogburn

Augmented balancing weights as linear regression

David Bruns-Smith, Oliver Dukes, Avi Feller, Elizabeth L. Ogburn

TL;DR

This work analyzes augmented balancing weights (AutoDML) where both outcome and weighting models are linear in a shared basis, revealing that the augmented estimator is equivalent to a single linear regression whose coefficients are an affine blend of the base outcome model and the OLS solution. It shows that special cases like double ridge correspond to undersmoothed ridge regression in kernel form, enabling semiparametric efficiency insights, while ell_infty augmentation induces double-selection behavior when the outcome model is lasso. The authors derive finite-sample MSE expressions and propose oracle hyperparameters, then discuss practical tuning schemes and demonstrate the theory on simulations and the Lalonde dataset, highlighting the critical role of hyperparameter choice. Overall, the results unify diverse AutoDML estimators under a common linear-algebraic framework and offer concrete guidance for hyperparameter tuning and interpretation of augmented balancing weights in causal inference.

Abstract

We provide a novel characterization of augmented balancing weights, also known as automatic debiased machine learning (AutoDML). These popular doubly robust or de-biased machine learning estimators combine outcome modeling with balancing weights - weights that achieve covariate balance directly in lieu of estimating and inverting the propensity score. When the outcome and weighting models are both linear in some (possibly infinite) basis, we show that the augmented estimator is equivalent to a single linear model with coefficients that combine the coefficients from the original outcome model and coefficients from an unpenalized ordinary least squares (OLS) fit on the same data. We see that, under certain choices of regularization parameters, the augmented estimator often collapses to the OLS estimator alone; this occurs for example in a re-analysis of the Lalonde 1986 dataset. We then extend these results to specific choices of outcome and weighting models. We first show that the augmented estimator that uses (kernel) ridge regression for both outcome and weighting models is equivalent to a single, undersmoothed (kernel) ridge regression. This holds numerically in finite samples and lays the groundwork for a novel analysis of undersmoothing and asymptotic rates of convergence. When the weighting model is instead lasso-penalized regression, we give closed-form expressions for special cases and demonstrate a ``double selection'' property. Our framework opens the black box on this increasingly popular class of estimators, bridges the gap between existing results on the semiparametric efficiency of undersmoothed and doubly robust estimators, and provides new insights into the performance of augmented balancing weights.

Augmented balancing weights as linear regression

TL;DR

Abstract

Paper Structure (89 sections, 17 theorems, 152 equations, 16 figures, 3 tables)

This paper contains 89 sections, 17 theorems, 152 equations, 16 figures, 3 tables.

Introduction
Related work
Balancing weights and AutoDML.
Numerical equivalences for balancing weights.
Problem setup and background
Setup and motivation
Example: Estimating counterfactual means
General class of functionals via the Riesz representer
Balancing weights: Background and general form
Linear balancing weights
Novel equivalence results for (augmented) balancing weights and outcome regression models
Weighting alone
Augmented balancing weights
Augmented $\ell_2$ Balancing Weights
General linear outcome model
...and 74 more sections

Key Result

Proposition 3.1

Let $\hat{w}^\delta \coloneqq \hat{\theta}^\delta \Phi_p^\top$, $\hat{\theta}^\delta \in \mathbb{R}^d$, be any linear balancing weights, with corresponding weighted features $\hat{\Phi}_q^\delta \coloneqq \tfrac{1}{n} \hat{w}^\delta \Phi_p$. Let $\hat{\beta}_{\text{ols}} = ( \Phi_p^\top\Phi_p)^\dag where $\widehat{\Delta}^\delta = \hat{\Phi}_q^\delta - \bar{\Phi}_p$ is the mean feature shift imp

Figures (16)

Figure 1: Regularization paths for "double ridge" augmented $\ell_2$ balancing weights. Panel (a) shows the coefficients $\hat{\beta}_{\text{reg}}^\lambda$ of a ridge regression of $Y_p$ on $\Phi_p$ with hyperparameter $\lambda$. The black dots on the left are the OLS coefficients, with $\lambda = 0$. The red dots at $\lambda = 5$ illustrate the coefficients at a plausible hyperparameter value, $\hat{\beta}_{\text{reg}}^{5}$. Panel (b) shows re-weighted covariates, $\hat{\Phi}_q^\delta$, for the $\ell_2$ balancing weights problem with hyperparameter $\delta$; the black dots show exact balance, which corresponds to OLS. As $\delta$ increases, the weights converge to uniform weights and $\hat{\Phi}_q^\delta$ converges to $\overline{\Phi}_p$, which we have centered at zero. Panel (c) shows the augmented coefficients, $\hat{\beta}_{\ell_2}$ as a function of the weight regularization parameter $\delta$. The black dots on the left are the OLS coefficients. As $\delta \to \infty$, the coefficients converge to $\hat{\beta}_{\text{reg}}^5$. All three regularization paths have essentially identical qualitative behavior.
Figure 2: Regularization paths for "double lasso" augmented $\ell_\infty$ balancing weights. Panel (a) shows the coefficients $\hat{\beta}_{\text{reg}}^\lambda$ of a lasso regression of $Y_p$ on $\Phi_p$ with hyperparameter $\lambda$. The black dots on the left are the OLS coefficients, with $\lambda = 0$. The red dots at $\lambda = 0.2$ illustrate the coefficients at a plausible hyperparameter value, $\hat{\beta}_{\text{reg}}^{0.2}$. Panel (b) shows re-weighted covariates, $\hat{\Phi}_q^\delta$, for the $\ell_\infty$ balancing weights problem with hyperparameter $\delta$; the black dots show exact balance, which corresponds to OLS. As $\delta$ increases, the weights converge to uniform weights and $\hat{\Phi}_q^\delta$ converges to $\overline{\Phi}_p$, which we have centered at zero. Panel (c) shows the augmented coefficients, $\hat{\beta}_{\ell_\infty}$ as a function of the weight regularization parameter $\delta$. The black dots on the left are the OLS coefficients. As $\delta \to \infty$, the coefficients converge to $\hat{\beta}_{\text{reg}}^{0.2}$. All three regularization paths show the typical lasso "soft thresholding" behavior. The regularization path for the augmented estimator also shows "double selection" behavior.
Figure 3: Augmented balancing weights estimates for the lalonde1986evaluating data set with the expanded set of 171 features used in farrell2015robust; the top row shows ridge-augmented $\ell_2$ balancing, and the bottom row shows lasso-augmented $\ell_\infty$ balancing. Panels (a) and (d) show the 3-fold cross-validated $R^2$ for the ridge- and lasso-penalized regression of $Y_p$ on $\Phi_p$ among control units across the hyperparameter $\lambda$; the purple dotted lines show the CV-optimal value for each. Panel (b) and (e) show the 3-fold cross-validated imbalance for $\ell_2$ and $\ell_\infty$ balancing weights across the hyperparameter $\delta$; the green dotted lines show the CV-optimal value for each. Panels (c) and (f) show the point estimates for the augmented estimators across the weighting hyperparameter $\delta$; the black triangles correspond to the OLS point estimate; the green and red dotted lines correspond to the cross-validated balance and Riesz loss respectively; the purple line corresponds to the cross-validated ridge hyperparameter (for $\delta = \hat{\lambda})$. The variance-based hyperparameter for ridge is $\hat{\sigma}^2/n^2 = 104.8$ and for lasso is $137.5$. The corresponding point estimates are $1923.6$ and $725.8$ respectively, essentially equal to the plug-in outcome model estimates.
Figure 4: Ridge-augmented $\ell_2$ balancing weights ("double ridge") for lalonde1986evaluating with the original 11 covariates. Panel (a) shows the 3-fold cross-validated $R^2$ for the Ridge-penalized regression of $Y_p$ on $\Phi_p$ among control units across the hyperparameter $\lambda$; the purple dotted line shows the CV-optimal value, $\hat{\lambda}$. Panel (b) shows the 3-fold cross-validated imbalance for $\ell_2$ balancing weights across the hyperparameter $\delta$; the green dotted line shows the CV-optimal value, which is $\delta = 0$ or exact balance. Panel (c) shows the point estimate for the augmented estimator across the weighting hyperparameter $\delta$; the black triangle corresponds to the OLS point estimate, the green dotted line corresponds to cross-validated balance, the red dotted line corresponds to cross-validated Riesz loss, and the purple dotted line corresponds to the ridge outcome hyperparameter.
Figure D.1: Decomposing the estimate from nonlinear augmented balancing weights for the "short" lalonde1986evaluating example.
...and 11 more figures

Theorems & Definitions (44)

Remark 1: Intercept
Remark 2: Equivalence with kernel ridge regression
Proposition 3.1
Proposition 3.2
Remark 3: Sample splitting
Remark 4: Infinite dimensional setting
Remark 5: Nonlinear balancing weights
Remark 6: Non-negative weights
Remark 7: Bilinear form
Proposition 4.1
...and 34 more

Augmented balancing weights as linear regression

TL;DR

Abstract

Augmented balancing weights as linear regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (44)