Nonparametric Instrumental Variable Regression with Observed Covariates
Zikai Shen, Zonghao Chen, Dimitri Meunier, Ingo Steinwart, Arthur Gretton, Zhu Li
TL;DR
This work introduces NPIV-O, a nonparametric instrumental-variable regression framework that incorporates observed covariates O to identify heterogeneous causal effects in Y = f_*(X,O) + ε with E[ε|Z,O]=0. It develops the KIV-O algorithm, a two-stage RKHS-based method that first learns a conditional-mean embedding via a vector-valued kernel and then performs a second-stage kernel regression with anisotropic Gaussian kernels tuned to the intrinsic smoothness of f_*, capturing mixed-smoothness in the X and O directions. The authors establish an upper L^2-learning-rate for KIV-O and the first L^2-minimax lower bound for NPIV-O, showing rates that interpolate between NPIV and NPR and demonstrating adaptivity to anisotropic Besov smoothness; they also reveal a gap between upper and lower bounds attributable to kernel-lengthscale choices and partial smoothing, with connections to proximal causal inference. Overall, the paper advances the theory and methodology for NPIV-O, offering a principled way to leverage observed covariates for heterogeneity-aware causal estimation while clarifying fundamental limits and practical considerations for kernel-based operators. The results have potential impact on causal inference in settings with high-dimensional covariates and partially identified models, including proximal causal learning contexts.
Abstract
We study the problem of nonparametric instrumental variable regression with observed covariates, which we refer to as NPIV-O. Compared with standard nonparametric instrumental variable regression (NPIV), the additional observed covariates facilitate causal identification and enables heterogeneous causal effect estimation. However, the presence of observed covariates introduces two challenges for its theoretical analysis. First, it induces a partial identity structure, which renders previous NPIV analyses - based on measures of ill-posedness, stability conditions, or link conditions - inapplicable. Second, it imposes anisotropic smoothness on the structural function. To address the first challenge, we introduce a novel Fourier measure of partial smoothing; for the second challenge, we extend the existing kernel 2SLS instrumental variable algorithm with observed covariates, termed KIV-O, to incorporate Gaussian kernel lengthscales adaptive to the anisotropic smoothness. We prove upper $L^2$-learning rates for KIV-O and the first $L^2$-minimax lower learning rates for NPIV-O. Both rates interpolate between known optimal rates of NPIV and nonparametric regression (NPR). Interestingly, we identify a gap between our upper and lower bounds, which arises from the choice of kernel lengthscales tuned to minimize a projected risk. Our theoretical analysis also applies to proximal causal inference, an emerging framework for causal effect estimation that shares the same conditional moment restriction as NPIV-O.
