Table of Contents
Fetching ...

Optimal and Structure-Adaptive CATE Estimation with Kernel Ridge Regression

Seok-Jin Kim

TL;DR

A unified two-stage kernel ridge regression (KRR) method that attains minimax rates governed by the complexity of the contrast function rather than the nuisance class, in terms of both sample size and overlap is developed.

Abstract

We propose an optimal algorithm for estimating conditional average treatment effects (CATEs) when response functions lie in a reproducing kernel Hilbert space (RKHS). We study settings in which the contrast function is structurally simpler than the nuisance functions: (i) it lies in a lower-complexity RKHS with faster eigenvalue decay, (ii) it satisfies a source condition relative to the nuisance kernel, or (iii) it depends on a known low-dimensional covariate representation. We develop a unified two-stage kernel ridge regression (KRR) method that attains minimax rates governed by the complexity of the contrast function rather than the nuisance class, in terms of both sample size and overlap. We also show that a simple model-selection step over candidate contrast spaces and regularization levels yields an oracle inequality, enabling adaptation to unknown CATE regularity.

Optimal and Structure-Adaptive CATE Estimation with Kernel Ridge Regression

TL;DR

A unified two-stage kernel ridge regression (KRR) method that attains minimax rates governed by the complexity of the contrast function rather than the nuisance class, in terms of both sample size and overlap is developed.

Abstract

We propose an optimal algorithm for estimating conditional average treatment effects (CATEs) when response functions lie in a reproducing kernel Hilbert space (RKHS). We study settings in which the contrast function is structurally simpler than the nuisance functions: (i) it lies in a lower-complexity RKHS with faster eigenvalue decay, (ii) it satisfies a source condition relative to the nuisance kernel, or (iii) it depends on a known low-dimensional covariate representation. We develop a unified two-stage kernel ridge regression (KRR) method that attains minimax rates governed by the complexity of the contrast function rather than the nuisance class, in terms of both sample size and overlap. We also show that a simple model-selection step over candidate contrast spaces and regularization levels yields an oracle inequality, enabling adaptation to unknown CATE regularity.
Paper Structure (87 sections, 14 theorems, 124 equations, 7 tables, 2 algorithms)

This paper contains 87 sections, 14 theorems, 124 equations, 7 tables, 2 algorithms.

Key Result

Lemma 1

The minimax squared $L^2$-error for estimating $h^\star$ is lower-bounded by $\texttt{LB-L2}(n\kappa; \mathcal{H})$, up to constant factors. Similarly, the minimax squared pointwise evaluation error at $x_0$ is lower-bounded by $\texttt{LB-PE}(n\kappa; \mathcal{H})$, up to constant factors.

Theorems & Definitions (20)

  • Definition 1
  • Lemma 1: Informal: Lower Bounds
  • proof
  • Theorem 1: General error bound
  • Corollary 1: $L^2$-Error Bounds for Model \ref{['model; s1']}
  • Corollary 2: $L^2$-Error Bounds for Model \ref{['model; s2']}
  • Corollary 3: $L^2$-Error Bounds for Model \ref{['model; s3']}
  • Theorem 2: Oracle Inequality
  • Corollary 4: Point Evaluation: Model \ref{['model; s1']} $\&$ \ref{['model; s2']}
  • Corollary 5: Point Evaluation: Model \ref{['model; s3']}
  • ...and 10 more