Kernel Ridge Riesz Representers: Generalization, Mis-specification, and the Counterfactual Effective Dimension
Rahul Singh
TL;DR
This paper introduces Kernel Ridge Riesz Representers (KRRR), a unifying framework that generalizes kernel ridge regression and kernel balancing weights to evaluate causal effects under mis-specification. By defining the counterfactual effective dimension and proving population $L_2$ rates that match known lower bounds, it provides rate-optimal generalization guarantees even when the regression is not correctly specified in the RKHS. The estimator admits a standalone closed-form solution and supports out-of-sample evaluation, enabling debiased, sample-splitting inference for heterogeneous causal functions, not just average effects. The theory extends Gaussian-approximation-based inference to nonparametric causal functionals and preserves semiparametric guarantees under mis-specification, with practical demonstrations on nominal coverage for heterogeneous effects and an empirical analysis of 401(k) eligibility on assets by age. Overall, KRRR offers robust, inference-ready causal estimates in settings with high-dimensional covariates and potential model misspecification, broadening the applicability of kernel-based balancing techniques.
Abstract
Kernel balancing weights provide confidence intervals for average treatment effects, based on the idea of balancing covariates for the treated group and untreated group in feature space, often with ridge regularization. Previous works on the classical kernel ridge balancing weights have certain limitations: (i) not articulating generalization error for the balancing weights, (ii) typically requiring correct specification of features, and (iii) justifying Gaussian approximation for only average effects. I interpret kernel balancing weights as kernel ridge Riesz representers (KRRR) and address these limitations via a new characterization of the counterfactual effective dimension. KRRR is an exact generalization of kernel ridge regression and kernel ridge balancing weights. I prove strong properties similar to kernel ridge regression: population $L_2$ rates controlling generalization error, and a standalone closed form solution that can interpolate. The framework relaxes the stringent assumption that the underlying regression model is correctly specified by the features. It extends Gaussian approximation beyond average effects to heterogeneous effects, justifying confidence sets for causal functions. I use KRRR to quantify uncertainty for heterogeneous treatment effects, by age, of 401(k) eligibility on assets.
