Table of Contents
Fetching ...

Estimation in high-dimensional linear regression: Post-Double-Autometrics as an alternative to Post-Double-Lasso

Sullivan Hué, Sébastien Laurent, Ulrich Aiounou, Emmanuel Flachaire

TL;DR

This paper addresses omitted-variable bias in high-dimensional linear regression when estimating causal effects by replacing Post-Double-Lasso with Post-Double-Autometrics, which uses the significance-driven Autometrics (GETS) for variable selection in both stages. By controlling the inclusion of irrelevant variables through a target size parameter, PDA demonstrates robustness to covariate correlation, to the number of covariates relative to observations, and to tuning choices, yielding smaller bias and RMSE than competing methods in simulations. The empirical growth application to Barro-style data finds no evidence of convergence when using PDA, in contrast to some Lasso-based results, highlighting the practical impact of the method for macroeconomic causality analyses. Overall, Post-Double-Autometrics offers a principled, inference-based alternative for reliable causal estimation in high-dimensional observational studies.

Abstract

Post-Double-Lasso is becoming the most popular method for estimating linear regression models with many covariates when the purpose is to obtain an accurate estimate of a parameter of interest, such as an average treatment effect. However, this method can suffer from substantial omitted variable bias in finite sample. We propose a new method called Post-Double-Autometrics, which is based on Autometrics, and show that this method outperforms Post-Double-Lasso. Its use in a standard application of economic growth sheds new light on the hypothesis of convergence from poor to rich economies.

Estimation in high-dimensional linear regression: Post-Double-Autometrics as an alternative to Post-Double-Lasso

TL;DR

This paper addresses omitted-variable bias in high-dimensional linear regression when estimating causal effects by replacing Post-Double-Lasso with Post-Double-Autometrics, which uses the significance-driven Autometrics (GETS) for variable selection in both stages. By controlling the inclusion of irrelevant variables through a target size parameter, PDA demonstrates robustness to covariate correlation, to the number of covariates relative to observations, and to tuning choices, yielding smaller bias and RMSE than competing methods in simulations. The empirical growth application to Barro-style data finds no evidence of convergence when using PDA, in contrast to some Lasso-based results, highlighting the practical impact of the method for macroeconomic causality analyses. Overall, Post-Double-Autometrics offers a principled, inference-based alternative for reliable causal estimation in high-dimensional observational studies.

Abstract

Post-Double-Lasso is becoming the most popular method for estimating linear regression models with many covariates when the purpose is to obtain an accurate estimate of a parameter of interest, such as an average treatment effect. However, this method can suffer from substantial omitted variable bias in finite sample. We propose a new method called Post-Double-Autometrics, which is based on Autometrics, and show that this method outperforms Post-Double-Lasso. Its use in a standard application of economic growth sheds new light on the hypothesis of convergence from poor to rich economies.

Paper Structure

This paper contains 11 sections, 7 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Empirical distribution of the Post-Double-Lasso (left panel) and Post-Double-Autometrics (right panel) estimation of the treatment effect. In both panels, the distribution of the oracle estimator corresponds to the solid line while the true value corresponds to the dotted vertical line.
  • Figure 2: Bias, RMSE, Gauge and Potency of $\hat{\delta}$ with dependent covariates $\rho \in [-0.9 ; 0.9]$. Design: $\psi^y = 2.5$, $\psi^d = 4$, $n=400$, $p=210$.
  • Figure 3: Bias of $\hat{\delta}$, with dependent covariates $\rho \in [-0.9 ; 0.9]$ and varying non-centrality measure $\psi^d \in [1 ; 8]$. Design: $\psi^y = 2.5$, $n=400$, $p=210$.
  • Figure 4: RMSE of $\hat{\delta}$, with dependent covariates $\rho \in [-0.9 ; 0.9]$ and varying non-centrality measure $\psi^d \in [1 ; 8]$. Design: $\psi^y = 2.5$, $n=400$, $p=210$.
  • Figure 5: Potency of $\hat{\delta}$, with dependent covariates $\rho \in [-0.9 ; 0.9]$ and varying non-centrality measure $\psi^d \in [1 ; 8]$. Design: $\psi^y = 2.5$, $n=400$, $p=210$.
  • ...and 7 more figures