Nonparametric Instrumental Regression via Kernel Methods is Minimax Optimal
Dimitri Meunier, Zhu Li, Tim Christensen, Arthur Gretton
TL;DR
This work analyzes kernel NPIV for nonparametric instrumental variable estimation, extending beyond prior identifiability assumptions by targeting the minimum-norm solution and proving convergence in the strong $L_2$-norm. It introduces a subspace-size measure via a link condition that ties the projected CME subspace to the kernel regression space, showing how instrument strength affects learning efficiency. By adopting general spectral regularization in Stage 1, the authors overcome the saturation of standard Tikhonov regularization and derive minimax-optimal rates under mild conditions, including misspecification. The results provide a principled framework for kernel NPIV with stable, rate-optimal performance in both identified and unidentified settings, with clear implications for the role of instrument strength and potential data-adaptive Stage 1 methods in practice.
Abstract
We study the kernel instrumental variable algorithm of \citet{singh2019kernel}, a nonparametric two-stage least squares (2SLS) procedure which has demonstrated strong empirical performance. We provide a convergence analysis that covers both the identified and unidentified settings: when the structural function cannot be identified, we show that the kernel NPIV estimator converges to the IV solution with minimum norm. Crucially, our convergence is with respect to the strong $L_2$-norm, rather than a pseudo-norm. Additionally, we characterize the smoothness of the target function without relying on the instrument, instead leveraging a new description of the projected subspace size (this being closely related to the link condition in inverse learning literature). With the subspace size description and under standard kernel learning assumptions, we derive, for the first time, the minimax optimal learning rate for kernel NPIV in the strong $L_2$-norm. Our result demonstrates that the strength of the instrument is essential to achieve efficient learning. We also improve the original kernel NPIV algorithm by adopting a general spectral regularization in stage 1 regression. The modified regularization can overcome the saturation effect of Tikhonov regularization.
