Towards a Unified Analysis of Neural Networks in Nonparametric Instrumental Variable Regression: Optimization and Generalization
Zonghao Chen, Atsushi Nitanda, Arthur Gretton, Taiji Suzuki
TL;DR
This work tackles nonparametric instrumental variable regression (NPIV) with neural-network features through a global-optimization lens. By lifting the two-stage regression into mean-field Langevin dynamics and reformulating the bilevel problem as a first-order penalty-Lagrangian, the authors design F$^2$BMLD, a fully first-order algorithm that achieves global convergence guarantees despite nonconvexity and nested dependence. They establish both convergence to the Lagrangian optimum and a Gamma-convergence result ensuring consistency with the original bilevel objective as regularization vanishes, along with a generalization bound that reveals a trade-off controlled by the Lagrange multiplier $\lambda$. Empirically, F$^2$BMLD attains competitive offline policy evaluation performance with more stable training and smaller batch-size requirements than prior DFIV methods. This unified, theory-backed framework advances understanding of optimization and generalization for neural NPIV models and suggests practical routes toward scalable, robust NPIV analysis in downstream tasks such as offline RL.
Abstract
We establish the first global convergence result of neural networks for two stage least squares (2SLS) approach in nonparametric instrumental variable regression (NPIV). This is achieved by adopting a lifted perspective through mean-field Langevin dynamics (MFLD), unlike standard MFLD, however, our setting of 2SLS entails a \emph{bilevel} optimization problem in the space of probability measures. To address this challenge, we leverage the penalty gradient approach recently developed for bilevel optimization which formulates bilevel optimization as a Lagrangian problem. This leads to a novel fully first-order algorithm, termed \texttt{F$^2$BMLD}. Apart from the convergence bound, we further provide a generalization bound, revealing an inherent trade-off in the choice of the Lagrange multiplier between optimization and statistical guarantees. Finally, we empirically validate the effectiveness of the proposed method on an offline reinforcement learning benchmark.
