Table of Contents
Fetching ...

Triple/Debiased Lasso for Statistical Inference of Conditional Average Treatment Effects

Masahiro Kato

TL;DR

The paper addresses inference for Conditional Average Treatment Effects (CATEs) when covariates are high-dimensional. It introduces the Triple/Debiased Lasso (TDL) for CATE estimation by combining doubly robust score construction, Lasso regression on the outcome difference, cross-fitting, and debiased Lasso to achieve $\sqrt{n}$-consistency and valid confidence intervals. The methodology features a weighted least squares step to improve efficiency under covariate-dependent variance and a nodewise Lasso-based construction of the inverse for debiasing. The authors prove consistency and asymptotic normality of the estimator under standard high-dimensional assumptions and demonstrate its properties through simulations, highlighting its relevance for individualized causal inference in complex datasets.

Abstract

This study investigates the estimation and the statistical inference about Conditional Average Treatment Effects (CATEs), which have garnered attention as a metric representing individualized causal effects. In our data-generating process, we assume linear models for the outcomes associated with binary treatments and define the CATE as a difference between the expected outcomes of these linear models. This study allows the linear models to be high-dimensional, and our interest lies in consistent estimation and statistical inference for the CATE. In high-dimensional linear regression, one typical approach is to assume sparsity. However, in our study, we do not assume sparsity directly. Instead, we consider sparsity only in the difference of the linear models. We first use a doubly robust estimator to approximate this difference and then regress the difference on covariates with Lasso regularization. Although this regression estimator is consistent for the CATE, we further reduce the bias using the techniques in double/debiased machine learning (DML) and debiased Lasso, leading to $\sqrt{n}$-consistency and confidence intervals. We refer to the debiased estimator as the triple/debiased Lasso (TDL), applying both DML and debiased Lasso techniques. We confirm the soundness of our proposed method through simulation studies.

Triple/Debiased Lasso for Statistical Inference of Conditional Average Treatment Effects

TL;DR

The paper addresses inference for Conditional Average Treatment Effects (CATEs) when covariates are high-dimensional. It introduces the Triple/Debiased Lasso (TDL) for CATE estimation by combining doubly robust score construction, Lasso regression on the outcome difference, cross-fitting, and debiased Lasso to achieve -consistency and valid confidence intervals. The methodology features a weighted least squares step to improve efficiency under covariate-dependent variance and a nodewise Lasso-based construction of the inverse for debiasing. The authors prove consistency and asymptotic normality of the estimator under standard high-dimensional assumptions and demonstrate its properties through simulations, highlighting its relevance for individualized causal inference in complex datasets.

Abstract

This study investigates the estimation and the statistical inference about Conditional Average Treatment Effects (CATEs), which have garnered attention as a metric representing individualized causal effects. In our data-generating process, we assume linear models for the outcomes associated with binary treatments and define the CATE as a difference between the expected outcomes of these linear models. This study allows the linear models to be high-dimensional, and our interest lies in consistent estimation and statistical inference for the CATE. In high-dimensional linear regression, one typical approach is to assume sparsity. However, in our study, we do not assume sparsity directly. Instead, we consider sparsity only in the difference of the linear models. We first use a doubly robust estimator to approximate this difference and then regress the difference on covariates with Lasso regularization. Although this regression estimator is consistent for the CATE, we further reduce the bias using the techniques in double/debiased machine learning (DML) and debiased Lasso, leading to -consistency and confidence intervals. We refer to the debiased estimator as the triple/debiased Lasso (TDL), applying both DML and debiased Lasso techniques. We confirm the soundness of our proposed method through simulation studies.
Paper Structure (41 sections, 11 theorems, 125 equations, 1 algorithm)

This paper contains 41 sections, 11 theorems, 125 equations, 1 algorithm.

Key Result

Theorem 4.3

Assume that Assumptions asmp:unconfounded--asmp:coherent hold. Suppose that a linear model in eq:linear holds with Assumptions asm:bounded_output--asmp:coherent, asm:eror, and asm:a1_vandegeer. If $\bm{W}$ is sub-Gaussian and ${\Sigma}$ has a strictly positive smallest eigenvalue $\Lambda^2_{\min}$,

Theorems & Definitions (19)

  • Definition 4.1: Compatibility condition. From (6.4) in Buhlmann2011
  • Theorem 4.3
  • Lemma 4.9
  • Lemma 4.10
  • Theorem 4.11: Asymptotic normality of the WTDL estimator
  • proof
  • Lemma B.1: Oracle inequality
  • Lemma B.2
  • proof : Proof of Lemma \ref{['lem:nongaussian']}
  • Lemma B.3: Basic inequality. Corresponding to Lemma 6.1 in Buhlmann2011
  • ...and 9 more