Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data

Xuxing Chen; Abhishek Roy; Yifan Hu; Krishnakumar Balasubramanian

Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data

Xuxing Chen, Abhishek Roy, Yifan Hu, Krishnakumar Balasubramanian

TL;DR

This work addresses instrumental variable regression with streaming data by recasting IVaR as a conditional stochastic optimization and developing fully online algorithms that avoid matrix inversions and mini-batches. It introduces two algorithms based on oracle access: TOSG-IVaR uses a two-sample gradient estimator and achieves last-iterate convergence with rate $\mathcal{O}(\log T / T)$ for linear models, while OTSG-IVaR handles the one-sample streaming setting with $\mathcal{O}(1/T^{1-\iota})$ convergence for any $\iota>0$ under mild assumptions. The methods avoid explicit modeling of the $Z$–$X$ relationship and do not rely on nested sampling or minimax dual formulations, yielding memory-efficient online IV regression. Empirical results corroborate theory, showing robust performance across dimensions and outperforming standard online 2SLS baselines, with clear advantages in per-iteration memory and stability. These contributions enable scalable IV regression in streaming contexts and open avenues for online inference and extensions to nonlinear or nonparametric IVaR.

Abstract

We develop and analyze algorithms for instrumental variable regression by viewing the problem as a conditional stochastic optimization problem. In the context of least-squares instrumental variable regression, our algorithms neither require matrix inversions nor mini-batches and provides a fully online approach for performing instrumental variable regression with streaming data. When the true model is linear, we derive rates of convergence in expectation, that are of order $\mathcal{O}(\log T/T)$ and $\mathcal{O}(1/T^{1-ι})$ for any $ι>0$, respectively under the availability of two-sample and one-sample oracles, respectively, where $T$ is the number of iterations. Importantly, under the availability of the two-sample oracle, our procedure avoids explicitly modeling and estimating the relationship between confounder and the instrumental variables, demonstrating the benefit of the proposed approach over recent works based on reformulating the problem as minimax optimization problems. Numerical experiments are provided to corroborate the theoretical results.

Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data

TL;DR

for linear models, while OTSG-IVaR handles the one-sample streaming setting with

convergence for any

under mild assumptions. The methods avoid explicit modeling of the

–

relationship and do not rely on nested sampling or minimax dual formulations, yielding memory-efficient online IV regression. Empirical results corroborate theory, showing robust performance across dimensions and outperforming standard online 2SLS baselines, with clear advantages in per-iteration memory and stability. These contributions enable scalable IV regression in streaming contexts and open avenues for online inference and extensions to nonlinear or nonparametric IVaR.

Abstract

and

for any

, respectively under the availability of two-sample and one-sample oracles, respectively, where

is the number of iterations. Importantly, under the availability of the two-sample oracle, our procedure avoids explicitly modeling and estimating the relationship between confounder and the instrumental variables, demonstrating the benefit of the proposed approach over recent works based on reformulating the problem as minimax optimization problems. Numerical experiments are provided to corroborate the theoretical results.

Paper Structure (19 sections, 10 theorems, 81 equations, 4 figures, 1 table, 2 algorithms)

This paper contains 19 sections, 10 theorems, 81 equations, 4 figures, 1 table, 2 algorithms.

Introduction
Literature Review
Two-sample One-stage Stochastic Gradient Method for IVaR
One-sample Two-stage Stochastic Gradient Method for IVaR
Numerical Experiments
Conclusion
Online updates of della2023online
Per-iteration Complexities
Experimental Details
Compute Resources
Experimental Details for Figure \ref{['fig:wrapfig']}
Proofs for Section \ref{['sec:two_sample']}
Proof of Theorem \ref{['thm:convergence_linear_two_sample']}
Proof of Lemma \ref{['lem: recursive_ineq']}
Proof of Lemma \ref{['lem:bdd_var_linear']}
...and 4 more sections

Key Result

Lemma 1

Suppose there exist $\theta_*\in \mathbb{R}^{d_x},\ \gamma_*\in\mathbb{R}^{d_z\times d_x},$ a non-linear map $\phi: \mathbb{R}^{d_x}\rightarrow \mathbb{R}^{d_x}$, and a positive semi-definite matrix $\Sigma\in \mathbb{R}^{d_z\times d_z}$ such that where $\epsilon_1, \epsilon_2$ are independent of $Z$ and then Assumptions aspt: scvx and aspt: var_general hold with $\vartheta_1=\vartheta_2=2$ and

Figures (4)

Figure 1: \ref{['eq:thetaupdatecso']} can initially diverge before converging eventually, leading to a worse performance in practical settings compared to Algorithm \ref{['alg:one_sample_onlineIV']}. See Appendix \ref{['sec:fig3app']} for the experimental setup.
Figure 2: $\mathbb{E}[\|\theta_t - \theta_*\|^2]$ of Algorithm \ref{['alg:two_sample_SGD']} under different settings detailed in Section \ref{['sec:exp']}.
Figure 3: Comparison of $\mathbb{E}[\|\theta_t - \theta_*\|^2]$ ($\log$-$\log$ scale) for Algorithm \ref{['alg:one_sample_onlineIV']}, Eq. \ref{['eq:thetaupdatecso']} and della2023online.
Figure 4: Comparison of Test MSE ($\log$-$\log$ scale) for Algorithm \ref{['alg:one_sample_onlineIV']}, Eq. \ref{['eq:thetaupdatecso']} and della2023online.

Theorems & Definitions (18)

Lemma 1
Theorem 1
Proposition 1
Theorem 2
Remark 1
proof
Lemma 2
proof
proof
proof : Proof of Theorem \ref{['th:mainthetaconv']}
...and 8 more

Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data

TL;DR

Abstract

Stochastic Optimization Algorithms for Instrumental Variable Regression with Streaming Data

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (18)