Table of Contents
Fetching ...

Nonparametric Instrumental Variable Regression through Stochastic Approximate Gradients

Yuri Fonseca, Caio Peixoto, Yuri Saporito

TL;DR

The paper addresses nonparametric instrumental variable regression (NPIV), an ill-posed problem linking the structural function $h^{*}$ to the conditional mean $r(Z)$ via a compact operator $\mathcal{P}$. It introduces SAGD-IV, a functional stochastic gradient descent method that minimizes the populational risk $\mathcal{R}(h) = \mathbb{E}[\ell(r(Z), \mathcal{P}[h](Z))]$ by computing a gradient in $L^{2}(X)$ using a density-ratio based representation: $\nabla \mathcal{R}(h)(x) = \mathbb{E}[\Phi(x,Z)\partial_{2}\ell(r(Z), \mathcal{P}[h](Z))]$. The method relies on estimators for $\Phi$, $r$, and $\mathcal{P}$ (e.g., density-ratio estimators, kernel mean embeddings, and regressors), and provides a finite-sample excess risk bound, showing that the error decays with iterations and estimator accuracy. Empirically, SAGD-IV variants (Kernel and Deep) achieve state-of-the-art or competitive mean-squared error with superior stability for continuous outcomes and extend naturally to binary outcomes using non-quadratic losses. The framework offers substantial flexibility by enabling kernel or neural estimators and non-quadratic losses, broadening NPIV applicability to diverse structural equations and data regimes.

Abstract

Instrumental variables (IVs) provide a powerful strategy for identifying causal effects in the presence of unobservable confounders. Within the nonparametric setting (NPIV), recent methods have been based on nonlinear generalizations of Two-Stage Least Squares and on minimax formulations derived from moment conditions or duality. In a novel direction, we show how to formulate a functional stochastic gradient descent algorithm to tackle NPIV regression by directly minimizing the populational risk. We provide theoretical support in the form of bounds on the excess risk, and conduct numerical experiments showcasing our method's superior stability and competitive performance relative to current state-of-the-art alternatives. This algorithm enables flexible estimator choices, such as neural networks or kernel based methods, as well as non-quadratic loss functions, which may be suitable for structural equations beyond the setting of continuous outcomes and additive noise. Finally, we demonstrate this flexibility of our framework by presenting how it naturally addresses the important case of binary outcomes, which has received far less attention by recent developments in the NPIV literature.

Nonparametric Instrumental Variable Regression through Stochastic Approximate Gradients

TL;DR

The paper addresses nonparametric instrumental variable regression (NPIV), an ill-posed problem linking the structural function to the conditional mean via a compact operator . It introduces SAGD-IV, a functional stochastic gradient descent method that minimizes the populational risk by computing a gradient in using a density-ratio based representation: . The method relies on estimators for , , and (e.g., density-ratio estimators, kernel mean embeddings, and regressors), and provides a finite-sample excess risk bound, showing that the error decays with iterations and estimator accuracy. Empirically, SAGD-IV variants (Kernel and Deep) achieve state-of-the-art or competitive mean-squared error with superior stability for continuous outcomes and extend naturally to binary outcomes using non-quadratic losses. The framework offers substantial flexibility by enabling kernel or neural estimators and non-quadratic losses, broadening NPIV applicability to diverse structural equations and data regimes.

Abstract

Instrumental variables (IVs) provide a powerful strategy for identifying causal effects in the presence of unobservable confounders. Within the nonparametric setting (NPIV), recent methods have been based on nonlinear generalizations of Two-Stage Least Squares and on minimax formulations derived from moment conditions or duality. In a novel direction, we show how to formulate a functional stochastic gradient descent algorithm to tackle NPIV regression by directly minimizing the populational risk. We provide theoretical support in the form of bounds on the excess risk, and conduct numerical experiments showcasing our method's superior stability and competitive performance relative to current state-of-the-art alternatives. This algorithm enables flexible estimator choices, such as neural networks or kernel based methods, as well as non-quadratic loss functions, which may be suitable for structural equations beyond the setting of continuous outcomes and additive noise. Finally, we demonstrate this flexibility of our framework by presenting how it naturally addresses the important case of binary outcomes, which has received far less attention by recent developments in the NPIV literature.
Paper Structure (34 sections, 7 theorems, 78 equations, 3 figures, 2 algorithms)

This paper contains 34 sections, 7 theorems, 78 equations, 3 figures, 2 algorithms.

Key Result

Proposition 3.3

The risk $\mathcal{R}$ is Fréchet differentiable and it's gradient satisfies where $\mathcal{P}^{ * } : L^2 ( Z ) \to L^2 ( X )$ is the adjoint of the operator $\mathcal{P}$.

Figures (3)

  • Figure 1: (Left) Log MSE for each model under different response functions. (Right) Plots of each method's estimator in a randomly selected realization of the data. On the left column, we have $Y$ observations in green and the true structural function in black.
  • Figure 2: (Left) Log MSE distribution in the binary response DGP. (Right) Plots of each method's estimator in a randomly selected realization of the data for the binary response DGP. On the left column, we have samples from the binary response variable $Y$ in green and the true structural function in black.
  • Figure 3: Log-MSE results for the experiments in Section \ref{['sec: continuous response']} with half the sample size.

Theorems & Definitions (17)

  • Proposition 3.3
  • Corollary 3.4
  • Remark 4.1
  • Theorem 4.3
  • Remark 4.4: $Y$ versus $r(Z)$
  • Remark 4.5: Consistency for $h^\ast$
  • Proposition A.1
  • proof
  • proof : Proof of \ref{['prop: gradient expression']}
  • proof : Proof of \ref{['cor: stochastic gradient expression']}
  • ...and 7 more