Differentially Private Two-Stage Gradient Descent for Instrumental Variable Regression
Haodong Liang, Yanhao Jin, Krishnakumar Balasubramanian, Lifeng Lai
TL;DR
The paper tackles private linear instrumental variable regression (IVaR) by proposing DP-2S-GD, a two-stage gradient-descent method with per-sample gradient clipping and Gaussian noise that provides $\rho$-zCDP guarantees. It delivers non-asymptotic convergence bounds that decompose error into optimization, privacy, and sampling components and prescribes how iteration count $T$ should scale with sample size $n$ and privacy budgets $(\rho_1,\rho_2)$. Theoretical results are complemented by synthetic and real-data experiments (Angrist and Card datasets) demonstrating accurate private estimates that align with the classical 2SLS benchmark as privacy is relaxed. The work advances privacy-preserving causal inference by delivering end-to-end DP guarantees for IVaR with explicit trade-offs and practical guidance for implementation in privacy-sensitive settings.
Abstract
We study instrumental variable regression (IVaR) under differential privacy constraints. Classical IVaR methods (like two-stage least squares regression) rely on solving moment equations that directly use sensitive covariates and instruments, creating significant risks of privacy leakage and posing challenges in designing algorithms that are both statistically efficient and differentially private. We propose a noisy two-state gradient descent algorithm that ensures $ρ$-zero-concentrated differential privacy by injecting carefully calibrated noise into the gradient updates. Our analysis establishes finite-sample convergence rates for the proposed method, showing that the algorithm achieves consistency while preserving privacy. In particular, we derive precise bounds quantifying the trade-off among optimization, privacy, and sampling error. To the best of our knowledge, this is the first work to provide both privacy guarantees and provable convergence rates for instrumental variable regression in linear models. We further validate our theoretical findings with experiments on both synthetic and real datasets, demonstrating that our method offers practical accuracy-privacy trade-offs.
