Table of Contents
Fetching ...

Regularized DeepIV with Model Selection

Zihao Li, Hui Lan, Vasilis Syrgkanis, Mengdi Wang, Masatoshi Uehara

TL;DR

The paper introduces Regularized DeepIV (RDIV), a minimax-oracle-free, two-stage method for nonparametric IV regression that can converge to the least-norm IV solution even without uniqueness. RDIV first learns the conditional density via MLE and then solves a Tikhonov-regularized regression using the learned operator, enabling model selection through out-of-sample validation. It provides finite-sample guarantees under a $eta$-source condition and extends to misspecified settings, showing robust performance and competitive rates comparable to minimax-based approaches without requiring a minimax oracle. An iterative extension further leverages well-posedness to achieve state-of-the-art rates, while numerical experiments on proximal causal inference-style DGPs demonstrate practical gains from regularization and model-selection procedures.

Abstract

In this paper, we study nonparametric estimation of instrumental variable (IV) regressions. While recent advancements in machine learning have introduced flexible methods for IV estimation, they often encounter one or more of the following limitations: (1) restricting the IV regression to be uniquely identified; (2) requiring minimax computation oracle, which is highly unstable in practice; (3) absence of model selection procedure. In this paper, we present the first method and analysis that can avoid all three limitations, while still enabling general function approximation. Specifically, we propose a minimax-oracle-free method called Regularized DeepIV (RDIV) regression that can converge to the least-norm IV solution. Our method consists of two stages: first, we learn the conditional distribution of covariates, and by utilizing the learned distribution, we learn the estimator by minimizing a Tikhonov-regularized loss function. We further show that our method allows model selection procedures that can achieve the oracle rates in the misspecified regime. When extended to an iterative estimator, our method matches the current state-of-the-art convergence rate. Our method is a Tikhonov regularized variant of the popular DeepIV method with a non-parametric MLE first-stage estimator, and our results provide the first rigorous guarantees for this empirically used method, showcasing the importance of regularization which was absent from the original work.

Regularized DeepIV with Model Selection

TL;DR

The paper introduces Regularized DeepIV (RDIV), a minimax-oracle-free, two-stage method for nonparametric IV regression that can converge to the least-norm IV solution even without uniqueness. RDIV first learns the conditional density via MLE and then solves a Tikhonov-regularized regression using the learned operator, enabling model selection through out-of-sample validation. It provides finite-sample guarantees under a -source condition and extends to misspecified settings, showing robust performance and competitive rates comparable to minimax-based approaches without requiring a minimax oracle. An iterative extension further leverages well-posedness to achieve state-of-the-art rates, while numerical experiments on proximal causal inference-style DGPs demonstrate practical gains from regularization and model-selection procedures.

Abstract

In this paper, we study nonparametric estimation of instrumental variable (IV) regressions. While recent advancements in machine learning have introduced flexible methods for IV estimation, they often encounter one or more of the following limitations: (1) restricting the IV regression to be uniquely identified; (2) requiring minimax computation oracle, which is highly unstable in practice; (3) absence of model selection procedure. In this paper, we present the first method and analysis that can avoid all three limitations, while still enabling general function approximation. Specifically, we propose a minimax-oracle-free method called Regularized DeepIV (RDIV) regression that can converge to the least-norm IV solution. Our method consists of two stages: first, we learn the conditional distribution of covariates, and by utilizing the learned distribution, we learn the estimator by minimizing a Tikhonov-regularized loss function. We further show that our method allows model selection procedures that can achieve the oracle rates in the misspecified regime. When extended to an iterative estimator, our method matches the current state-of-the-art convergence rate. Our method is a Tikhonov regularized variant of the popular DeepIV method with a non-parametric MLE first-stage estimator, and our results provide the first rigorous guarantees for this empirically used method, showcasing the importance of regularization which was absent from the original work.
Paper Structure (40 sections, 24 theorems, 122 equations, 1 figure, 6 tables, 3 algorithms)

This paper contains 40 sections, 24 theorems, 122 equations, 1 figure, 6 tables, 3 algorithms.

Key Result

Theorem 5.4

Suppose Assumption ass:source-cond,ass: realizability,ass:lower-bound hold. Let $\|Y\|_\infty\leq C_Y$, $\|h\|_\infty \leq C_\mathcal{H}$ holds for all $h\in\mathcal{H}$, $\|g\|_\infty \leq C_\mathcal{G}$ holds for all $g\in\mathcal{G}$. There exists absolute constant $c_1, c_2$, such that with prob In particular, by setting $\alpha = \delta_n^{\frac{2}{2+\min\{\beta, 2\}}}$ we have Here $\delta_

Figures (1)

  • Figure 1: A typical causal diagram for negative controls. The dashed edges may be absent, and the dashed circle around $S'$ indicates that $U$ is unobserved.

Theorems & Definitions (42)

  • Remark 4.1: Comparison with Deep IV
  • Remark 4.2: Computaion for $\hat{{\mathcal{T}}}$
  • Theorem 5.4: $L_2$ convergence rate for RDIV with MLE
  • Lemma 5.5: Regularization Bias
  • Lemma 5.6: Empirical Deviation & First-stage Bias
  • Lemma 5.7: MLE error
  • Remark 5.8: Removing the Boundedness Assumption
  • Theorem 6.1: $L_2$ convergence rate for RDIV with MLE under misspecification
  • Theorem 7.1: Model Selection Rates
  • Theorem 8.1: $L_2$ convergence rate for iterative MLE estimator
  • ...and 32 more