Optimal and Structure-Adaptive CATE Estimation with Kernel Ridge Regression

Seok-Jin Kim

Optimal and Structure-Adaptive CATE Estimation with Kernel Ridge Regression

Seok-Jin Kim

TL;DR

A unified two-stage kernel ridge regression (KRR) method that attains minimax rates governed by the complexity of the contrast function rather than the nuisance class, in terms of both sample size and overlap is developed.

Abstract

We propose an optimal algorithm for estimating conditional average treatment effects (CATEs) when response functions lie in a reproducing kernel Hilbert space (RKHS). We study settings in which the contrast function is structurally simpler than the nuisance functions: (i) it lies in a lower-complexity RKHS with faster eigenvalue decay, (ii) it satisfies a source condition relative to the nuisance kernel, or (iii) it depends on a known low-dimensional covariate representation. We develop a unified two-stage kernel ridge regression (KRR) method that attains minimax rates governed by the complexity of the contrast function rather than the nuisance class, in terms of both sample size and overlap. We also show that a simple model-selection step over candidate contrast spaces and regularization levels yields an oracle inequality, enabling adaptation to unknown CATE regularity.

Optimal and Structure-Adaptive CATE Estimation with Kernel Ridge Regression

TL;DR

Abstract

Paper Structure (87 sections, 14 theorems, 124 equations, 7 tables, 2 algorithms)

This paper contains 87 sections, 14 theorems, 124 equations, 7 tables, 2 algorithms.

Introduction
Contribution
Special Case: Rates for Sobolev Classes.
Related Work
Notation
Problem Setup
Treatment Regime and CATE
Three Models of Structural Simplicity
Relaxation via Mixed Smoothness.
Standard Assumptions
Fundamental Limits of CATE Estimation
Methodology: A Unified Approach
Algorithm Structure
1. Nuisance Estimation via Undersmoothed KRR.
2. Generating Pseudo-outcomes via Switch-Imputation.
...and 72 more sections

Key Result

Lemma 1

The minimax squared $L^2$-error for estimating $h^\star$ is lower-bounded by $\texttt{LB-L2}(n\kappa; \mathcal{H})$, up to constant factors. Similarly, the minimax squared pointwise evaluation error at $x_0$ is lower-bounded by $\texttt{LB-PE}(n\kappa; \mathcal{H})$, up to constant factors.

Theorems & Definitions (20)

Definition 1
Lemma 1: Informal: Lower Bounds
proof
Theorem 1: General error bound
Corollary 1: $L^2$-Error Bounds for Model \ref{['model; s1']}
Corollary 2: $L^2$-Error Bounds for Model \ref{['model; s2']}
Corollary 3: $L^2$-Error Bounds for Model \ref{['model; s3']}
Theorem 2: Oracle Inequality
Corollary 4: Point Evaluation: Model \ref{['model; s1']} $\&$ \ref{['model; s2']}
Corollary 5: Point Evaluation: Model \ref{['model; s3']}
...and 10 more

Optimal and Structure-Adaptive CATE Estimation with Kernel Ridge Regression

TL;DR

Abstract

Optimal and Structure-Adaptive CATE Estimation with Kernel Ridge Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (20)