Table of Contents
Fetching ...

Outcome-Aware Spectral Feature Learning for Instrumental Variable Regression

Dimitri Meunier, Jakub Wornbard, Vladimir R. Kostic, Antoine Moulin, Alek Fröhlich, Karim Lounici, Massimiliano Pontil, Arthur Gretton

TL;DR

The paper tackles causal effect estimation in NPIV under hidden confounding by noting that conventional spectral-feature learning is outcome-agnostic and can misalign with the true causal function. It introduces Augmented Spectral Feature Learning, which augments the operator to include outcome information via a rank-1 perturbation $\mathcal{T}_{\delta}$ and a contrastive loss, yielding task-specific spectral features. The authors provide non-asymptotic, high-probability guarantees for the resulting 2SLS estimator and demonstrate robustness to spectral misalignment through synthetic data and challenging dSprites benchmarks, as well as an Off-Policy Evaluation case in reinforcement learning. The results show that a small positive augmentation parameter $\delta$ often improves performance and broadens the applicability of spectral NPIV methods beyond well-aligned settings, with discussions on higher-rank extensions and practical delta-selection strategies.

Abstract

We address the problem of causal effect estimation in the presence of hidden confounders using nonparametric instrumental variable (IV) regression. An established approach is to use estimators based on learned spectral features, that is, features spanning the top singular subspaces of the operator linking treatments to instruments. While powerful, such features are agnostic to the outcome variable. Consequently, the method can fail when the true causal function is poorly represented by these dominant singular functions. To mitigate, we introduce Augmented Spectral Feature Learning, a framework that makes the feature learning process outcome-aware. Our method learns features by minimizing a novel contrastive loss derived from an augmented operator that incorporates information from the outcome. By learning these task-specific features, our approach remains effective even under spectral misalignment. We provide a theoretical analysis of this framework and validate our approach on challenging benchmarks.

Outcome-Aware Spectral Feature Learning for Instrumental Variable Regression

TL;DR

The paper tackles causal effect estimation in NPIV under hidden confounding by noting that conventional spectral-feature learning is outcome-agnostic and can misalign with the true causal function. It introduces Augmented Spectral Feature Learning, which augments the operator to include outcome information via a rank-1 perturbation and a contrastive loss, yielding task-specific spectral features. The authors provide non-asymptotic, high-probability guarantees for the resulting 2SLS estimator and demonstrate robustness to spectral misalignment through synthetic data and challenging dSprites benchmarks, as well as an Off-Policy Evaluation case in reinforcement learning. The results show that a small positive augmentation parameter often improves performance and broadens the applicability of spectral NPIV methods beyond well-aligned settings, with discussions on higher-rank extensions and practical delta-selection strategies.

Abstract

We address the problem of causal effect estimation in the presence of hidden confounders using nonparametric instrumental variable (IV) regression. An established approach is to use estimators based on learned spectral features, that is, features spanning the top singular subspaces of the operator linking treatments to instruments. While powerful, such features are agnostic to the outcome variable. Consequently, the method can fail when the true causal function is poorly represented by these dominant singular functions. To mitigate, we introduce Augmented Spectral Feature Learning, a framework that makes the feature learning process outcome-aware. Our method learns features by minimizing a novel contrastive loss derived from an augmented operator that incorporates information from the outcome. By learning these task-specific features, our approach remains effective even under spectral misalignment. We provide a theoretical analysis of this framework and validate our approach on challenging benchmarks.

Paper Structure

This paper contains 41 sections, 19 theorems, 132 equations, 16 figures, 9 tables.

Key Result

Proposition 1

Given $\delta \in \mathbb{R}$, for all parameters $\theta$ and $\omega$ it holds that $\mathcal{L}_{\delta}^{ (d)}(\theta,\omega) \geq - \left\| \mathcal{T}_{\delta} \right\|_{ \mathrm{HS}}^2$, The lower bound is achieved if and only if the learned operator $\Psi^{ (d)}_{\theta} [\Phi^{{ (d)} \star}

Figures (16)

  • Figure 1: In case of severe misalignment of $h_0$ and $\mathcal{T}$ (left), an ideal solution would aim to find another operator $\tilde{\mathcal{T}}$ whose top singular functions $\tilde{v}_i$ capture the signal in $h_0$ (right).
  • Figure 2: Distributions of relative IV regression MSEs ($\|\widehat{h}_\theta-h_0\|^2$) for the synthetic example with $\delta\in\{0,0.5,1.0,3.0,5.0\}$ and $c_\sigma\in\{0.2,0.8\}$
  • Figure 4: Left: Evolution of ${\mathcal{L}}^{ (d)}_0$ and ${\mathcal{R}}^{ (d)}_\delta$ for models learning $h_\text{new}$. Models with non-zero $\delta$ and a small ${\mathcal{L}}^{ (d)}_0$ (close to the value attained at $\delta=0$) demonstrate the best results. Each bar's mean value is noted above it. Right: Estimation of $\|\Pi_{\varphi_{\star}^{ (d)}}h_\text{new}\|^2$ for a range of $\delta$ values.
  • Figure 5: Distributions of cumulative alignment estimates and true values $\|\Pi_{\hat{v}^{(i)}}h_0\|^2$ for increasing $i$, evaluated on separately fitted models (with identical parameters) for $h_0=h_\text{old},h_\text{new}$ at $\delta=0$.
  • Figure 6: Comparison of how the distributions of $h_\text{old}$ and $h_\text{new}$ vary with each component of the instrument $Z$ (scale, orientation or $x$ position). The values of each component of $Z$ in dSprites are quantized. The $x$-axis positions in the figure correspond to those values. For each value of a component of $Z$, we display the distribution of the values of $h_0=h_\text{old},h_\text{new}$ evaluated on images where the component takes that value. The marked values are the means of each bin.
  • ...and 11 more figures

Theorems & Definitions (31)

  • Proposition 1
  • Theorem 1
  • Proposition 2
  • Proposition 3: Weyl's inequality
  • Theorem 2: Wedin sin-$\Theta$ Theorem
  • proof
  • Theorem 3: Eckart-Young-Mirsky Theorem
  • Proposition 4
  • proof
  • Proposition 5
  • ...and 21 more