Nonparametric Instrumental Variable Regression through Stochastic Approximate Gradients
Yuri Fonseca, Caio Peixoto, Yuri Saporito
TL;DR
The paper addresses nonparametric instrumental variable regression (NPIV), an ill-posed problem linking the structural function $h^{*}$ to the conditional mean $r(Z)$ via a compact operator $\mathcal{P}$. It introduces SAGD-IV, a functional stochastic gradient descent method that minimizes the populational risk $\mathcal{R}(h) = \mathbb{E}[\ell(r(Z), \mathcal{P}[h](Z))]$ by computing a gradient in $L^{2}(X)$ using a density-ratio based representation: $\nabla \mathcal{R}(h)(x) = \mathbb{E}[\Phi(x,Z)\partial_{2}\ell(r(Z), \mathcal{P}[h](Z))]$. The method relies on estimators for $\Phi$, $r$, and $\mathcal{P}$ (e.g., density-ratio estimators, kernel mean embeddings, and regressors), and provides a finite-sample excess risk bound, showing that the error decays with iterations and estimator accuracy. Empirically, SAGD-IV variants (Kernel and Deep) achieve state-of-the-art or competitive mean-squared error with superior stability for continuous outcomes and extend naturally to binary outcomes using non-quadratic losses. The framework offers substantial flexibility by enabling kernel or neural estimators and non-quadratic losses, broadening NPIV applicability to diverse structural equations and data regimes.
Abstract
Instrumental variables (IVs) provide a powerful strategy for identifying causal effects in the presence of unobservable confounders. Within the nonparametric setting (NPIV), recent methods have been based on nonlinear generalizations of Two-Stage Least Squares and on minimax formulations derived from moment conditions or duality. In a novel direction, we show how to formulate a functional stochastic gradient descent algorithm to tackle NPIV regression by directly minimizing the populational risk. We provide theoretical support in the form of bounds on the excess risk, and conduct numerical experiments showcasing our method's superior stability and competitive performance relative to current state-of-the-art alternatives. This algorithm enables flexible estimator choices, such as neural networks or kernel based methods, as well as non-quadratic loss functions, which may be suitable for structural equations beyond the setting of continuous outcomes and additive noise. Finally, we demonstrate this flexibility of our framework by presenting how it naturally addresses the important case of binary outcomes, which has received far less attention by recent developments in the NPIV literature.
