Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression

Juno Kim; Dimitri Meunier; Arthur Gretton; Taiji Suzuki; Zhu Li

Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression

Juno Kim, Dimitri Meunier, Arthur Gretton, Taiji Suzuki, Zhu Li

TL;DR

This work analyzes nonparametric instrumental variable regression and provides a rigorous minimax-rate guarantee for the Deep Feature Instrumental Variable Regression (DFIV) method, showing that data-adaptive neural features yield optimal convergence when the structural function lies in a Besov class $B_{p,q}^s(\mathcal{X})$. It introduces a two-stage DFIV framework that learns feature maps $\psi_{\theta_x}$ and $\phi_{\theta_z}$ in tandem, with a smooth DNN class guaranteeing Besov-norm control, and proves both upper and lower bounds for the projected and full non-projected risks under link and smoothness conditions. A key result is the demonstration of a separation between DFIV and fixed-feature IV when $p<2$, establishing adaptivity to spatial inhomogeneity and showing that Stage 1 needs only $m=\Omega(n)$ samples to attain minimax rates, unlike kernel methods that require $m/n\to\infty$. The analysis combines a dynamic, data-dependent covering approach with Besov-space approximation theory, providing a principled basis for using neural features in causal IV regression and informing practical sample-splitting strategies. The findings have significant implications for the design of data-efficient, adaptive IV estimators in high-dimensional settings.

Abstract

We provide a convergence analysis of deep feature instrumental variable (DFIV) regression (Xu et al., 2021), a nonparametric approach to IV regression using data-adaptive features learned by deep neural networks in two stages. We prove that the DFIV algorithm achieves the minimax optimal learning rate when the target structural function lies in a Besov space. This is shown under standard nonparametric IV assumptions, and an additional smoothness assumption on the regularity of the conditional distribution of the covariate given the instrument, which controls the difficulty of Stage 1. We further demonstrate that DFIV, as a data-adaptive algorithm, is superior to fixed-feature (kernel or sieve) IV methods in two ways. First, when the target function possesses low spatial homogeneity (i.e., it has both smooth and spiky/discontinuous regions), DFIV still achieves the optimal rate, while fixed-feature methods are shown to be strictly suboptimal. Second, comparing with kernel-based two-stage regression estimators, DFIV is provably more data efficient in the Stage 1 samples.

Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression

TL;DR

. It introduces a two-stage DFIV framework that learns feature maps

and

in tandem, with a smooth DNN class guaranteeing Besov-norm control, and proves both upper and lower bounds for the projected and full non-projected risks under link and smoothness conditions. A key result is the demonstration of a separation between DFIV and fixed-feature IV when

, establishing adaptivity to spatial inhomogeneity and showing that Stage 1 needs only

samples to attain minimax rates, unlike kernel methods that require

. The analysis combines a dynamic, data-dependent covering approach with Besov-space approximation theory, providing a principled basis for using neural features in causal IV regression and informing practical sample-splitting strategies. The findings have significant implications for the design of data-efficient, adaptive IV estimators in high-dimensional settings.

Abstract

Paper Structure (48 sections, 22 theorems, 193 equations, 1 figure, 2 tables)

This paper contains 48 sections, 22 theorems, 193 equations, 1 figure, 2 tables.

Introduction
Our contributions.
Comparison with Existing Works
Error analysis of NPIV estimators.
Estimation ability of DNNs.
Background and DFIV Algorithm
Instrumental Variable Regression
Besov spaces.
Two-Stage Least Squares Regression
Deep Feature Instrumental Variable Regression
Smooth DNN class.
Theoretical Analysis of DFIV
Link and Smoothness Conditions
Controlling Stage 2 smoothness.
Projected Upper and Lower Bounds
...and 33 more sections

Key Result

Theorem 3.1

Under Assumptions ass:noiseass:noise1,ass:str,ass:link and Assumption ass:smooth with domain restriction eqn:restricted or regularization eqn:soboreg with $\lambda$ asymptotic to the rate below, or Assumption ass:alternative without regularization, by choosing $\mathop{\mathrm{\mathcal{F}}}\nolimits

Figures (1)

Figure 1: Causal graph of IV.

Theorems & Definitions (38)

Definition 2.1
Definition 2.2: B-spline basis
Remark 2.3
Theorem 3.1: projected upper bound for DFIV
Lemma 3.2
Proposition 3.3: projected minimax lower bound
Corollary 3.4: projected optimality of DFIV
Theorem 3.5: full upper bound for DFIV
Remark 3.6
Proposition 3.7: full minimax lower bound
...and 28 more

Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression

TL;DR

Abstract

Optimality and Adaptivity of Deep Neural Features for Instrumental Variable Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (38)