Table of Contents
Fetching ...

Kernel Ridge Riesz Representers: Generalization, Mis-specification, and the Counterfactual Effective Dimension

Rahul Singh

TL;DR

This paper introduces Kernel Ridge Riesz Representers (KRRR), a unifying framework that generalizes kernel ridge regression and kernel balancing weights to evaluate causal effects under mis-specification. By defining the counterfactual effective dimension and proving population $L_2$ rates that match known lower bounds, it provides rate-optimal generalization guarantees even when the regression is not correctly specified in the RKHS. The estimator admits a standalone closed-form solution and supports out-of-sample evaluation, enabling debiased, sample-splitting inference for heterogeneous causal functions, not just average effects. The theory extends Gaussian-approximation-based inference to nonparametric causal functionals and preserves semiparametric guarantees under mis-specification, with practical demonstrations on nominal coverage for heterogeneous effects and an empirical analysis of 401(k) eligibility on assets by age. Overall, KRRR offers robust, inference-ready causal estimates in settings with high-dimensional covariates and potential model misspecification, broadening the applicability of kernel-based balancing techniques.

Abstract

Kernel balancing weights provide confidence intervals for average treatment effects, based on the idea of balancing covariates for the treated group and untreated group in feature space, often with ridge regularization. Previous works on the classical kernel ridge balancing weights have certain limitations: (i) not articulating generalization error for the balancing weights, (ii) typically requiring correct specification of features, and (iii) justifying Gaussian approximation for only average effects. I interpret kernel balancing weights as kernel ridge Riesz representers (KRRR) and address these limitations via a new characterization of the counterfactual effective dimension. KRRR is an exact generalization of kernel ridge regression and kernel ridge balancing weights. I prove strong properties similar to kernel ridge regression: population $L_2$ rates controlling generalization error, and a standalone closed form solution that can interpolate. The framework relaxes the stringent assumption that the underlying regression model is correctly specified by the features. It extends Gaussian approximation beyond average effects to heterogeneous effects, justifying confidence sets for causal functions. I use KRRR to quantify uncertainty for heterogeneous treatment effects, by age, of 401(k) eligibility on assets.

Kernel Ridge Riesz Representers: Generalization, Mis-specification, and the Counterfactual Effective Dimension

TL;DR

This paper introduces Kernel Ridge Riesz Representers (KRRR), a unifying framework that generalizes kernel ridge regression and kernel balancing weights to evaluate causal effects under mis-specification. By defining the counterfactual effective dimension and proving population rates that match known lower bounds, it provides rate-optimal generalization guarantees even when the regression is not correctly specified in the RKHS. The estimator admits a standalone closed-form solution and supports out-of-sample evaluation, enabling debiased, sample-splitting inference for heterogeneous causal functions, not just average effects. The theory extends Gaussian-approximation-based inference to nonparametric causal functionals and preserves semiparametric guarantees under mis-specification, with practical demonstrations on nominal coverage for heterogeneous effects and an empirical analysis of 401(k) eligibility on assets by age. Overall, KRRR offers robust, inference-ready causal estimates in settings with high-dimensional covariates and potential model misspecification, broadening the applicability of kernel-based balancing techniques.

Abstract

Kernel balancing weights provide confidence intervals for average treatment effects, based on the idea of balancing covariates for the treated group and untreated group in feature space, often with ridge regularization. Previous works on the classical kernel ridge balancing weights have certain limitations: (i) not articulating generalization error for the balancing weights, (ii) typically requiring correct specification of features, and (iii) justifying Gaussian approximation for only average effects. I interpret kernel balancing weights as kernel ridge Riesz representers (KRRR) and address these limitations via a new characterization of the counterfactual effective dimension. KRRR is an exact generalization of kernel ridge regression and kernel ridge balancing weights. I prove strong properties similar to kernel ridge regression: population rates controlling generalization error, and a standalone closed form solution that can interpolate. The framework relaxes the stringent assumption that the underlying regression model is correctly specified by the features. It extends Gaussian approximation beyond average effects to heterogeneous effects, justifying confidence sets for causal functions. I use KRRR to quantify uncertainty for heterogeneous treatment effects, by age, of 401(k) eligibility on assets.

Paper Structure

This paper contains 33 sections, 32 theorems, 39 equations, 2 figures, 1 table.

Key Result

Lemma 1

Suppose Assumption assumption:cont holds and $\gamma_0\in \mathcal{G} \subset L_2$. Then there exists a Riesz representer $\alpha_0\in L_2$ such that $\mathbb{E}\{m(W,f)\}=\mathbb{E}\{\alpha_0(W)f(W)\}$ for all $f\in \mathcal{G}$. Moreover, there exists a unique minimal Riesz representer $\alpha_0^{

Figures (2)

  • Figure 1: Simulation design. cate(v) is the curve and values of $\theta_0$ are the three points.
  • Figure 2: Heterogeneous treatment effects by age. The point estimates and confidence sets use KRRR. The smooth curve is a comparison estimator singh2020kernel.

Theorems & Definitions (70)

  • Example 1: Heterogeneous policy effects
  • Example 2: Heterogeneous treatment effects
  • Lemma 1: Riesz representation; Lemma S3.1 of chernozhukov2018global
  • Lemma 2: Main lemma
  • Theorem 1: Main theoretical result
  • Corollary 1: Main corollary
  • Proposition 1: A practical result
  • Definition 1: Kernel approximation to Riesz representer
  • Lemma 3: Towards a loss for KRRR
  • Definition 2: Loss for KRRR
  • ...and 60 more