Geometry-Aware Instrumental Variable Regression

Heiner Kremer; Bernhard Schölkopf

Geometry-Aware Instrumental Variable Regression

Heiner Kremer, Bernhard Schölkopf

TL;DR

The paper addresses IV regression under endogeneity by integrating OT-based geometry into conditional moment restrictions via the Sinkhorn Method of Moments (SMM). It develops a dual formulation and a leading-order expansion that support SGD optimization, proves consistency under standard CMR identifiability assumptions, and provides a kernel-based (Kernel-SMM) and a neural-network extension (Neural-SMM) for flexible instrument modeling. Empirically, SMM matches state-of-the-art estimators in standard settings and offers improved robustness to corrupted or adversarial data, with neural variants explored as potential scalability paths. This geometry-aware approach offers a practical, plug-and-play IV estimator that leverages data manifold structure to enhance robustness without sacrificing performance in benign environments.

Abstract

Instrumental variable (IV) regression can be approached through its formulation in terms of conditional moment restrictions (CMR). Building on variants of the generalized method of moments, most CMR estimators are implicitly based on approximating the population data distribution via reweightings of the empirical sample. While for large sample sizes, in the independent identically distributed (IID) setting, reweightings can provide sufficient flexibility, they might fail to capture the relevant information in presence of corrupted data or data prone to adversarial attacks. To address these shortcomings, we propose the Sinkhorn Method of Moments, an optimal transport-based IV estimator that takes into account the geometry of the data manifold through data-derivative information. We provide a simple plug-and-play implementation of our method that performs on par with related estimators in standard settings but improves robustness against data corruption and adversarial attacks.

Geometry-Aware Instrumental Variable Regression

TL;DR

Abstract

Paper Structure (30 sections, 14 theorems, 70 equations, 6 figures, 2 tables, 1 algorithm)

This paper contains 30 sections, 14 theorems, 70 equations, 6 figures, 2 tables, 1 algorithm.

Introduction
Our contributions
Empirical Likelihood Estimation for CMR
Sinkhorn Method of Moments
Optimal Transport
Consistency
Kernel-SMM
Reproducing kernel Hilbert space
Neural-SMM
Experimental Results
IV Regression with Corrupted Data
Adversarially Robust IV Regression
Related Work
Conclusion
Experimental Details
...and 15 more sections

Key Result

Theorem 3.1

Consider the Sinkhorn profile eq:3:primal with reference measure $\mu \otimes \nu \in \mathcal{P}(\Xi \times \Xi)$. Then eq:3:primal has the strongly dual form $R_\epsilon(f) = \sup_{h \in \mathcal{H}} D(f, h)$, where

Figures (6)

Figure 1: Paradigms to approximate $P_0$ from data (red dots) in the GEL framework. $\varphi$-divergence-based estimators (left) approximate $P_0$ by reweighting (weight $\hat{=}$ size) the sample (e.g., ai2003efficientbennett2020variational. MMD-based estimators (middle) allow for sampling additional data points (blue dots) pmlr-v202-kremer23a. In contrast, optimal transport-based estimators (right) allow to move around the data points (present work).
Figure 2: Sinkhorn profile. For every $f \in \mathcal{F}$, the Sinkhorn profile $R(f)$, \ref{['eq:3:primal']}, is the minimal distance between the empirical distribution $\hat{P}_{n}$ and the set of distributions satisfying the CMR \ref{['eq:3:cmr']}.
Figure 3: Robustness against corrupted data. We generate 1000 points from the process \ref{['eq:3:exp1']} and substitute in a proportion of the data the treatment variable $T$ for a random value sampled uniformly over the domain. Lines and error bars correspond to the mean and standard error computed over $20$ training datasets.
Figure 4: Adversarial robustness of IV estimators. We use a training set of size $n=1000$ and evaluate the learned models over FGSM attacks with increasing strength $\epsilon$. Lines and error bars show the mean and standard error over $20$ random training datasets. The table contains the MSE in the perturbation-free case.
Figure 5: Kernel-SMM dependency on hyperparameters. We evaluate the SMM estimator on the first experiment without random covariates for different hyperparameter configurations. Values correspond to the mean of the prediction error $E[ \| f(T;\hat{\theta}) - f(T;\theta_0) \|_2^2]$ averaged over models trained on $20$ random training sets.
...and 1 more figures

Theorems & Definitions (26)

Theorem 3.1: Duality
Theorem 3.2
Definition 3.3: SMM
Theorem 3.11: Consistency
Proposition 3.12
Theorem 3.13: Kernel-SMM
proof
proof
Lemma C.1: Corollary 9.31, kosorok2008introduction
Lemma C.2: Lemma 18, bennett2020variational
...and 16 more

Geometry-Aware Instrumental Variable Regression

TL;DR

Abstract

Geometry-Aware Instrumental Variable Regression

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (26)