Table of Contents
Fetching ...

Orthogonal Representation Learning for Estimating Causal Quantities

Valentyn Melnychuk, Dennis Frauen, Jonas Schweisthal, Stefan Feuerriegel

TL;DR

The paper addresses estimating heterogeneous causal quantities from high-dimensional observational data by reconciling end-to-end representation learning with Neyman-orthogonal meta-learners. It proposes OR-learners, a three-stage framework that learns representations, estimates nuisance functions, and then trains a Neyman-orthogonal target model using the learned representations, achieving quasi-oracle efficiency under a low-dimensional manifold assumption. Theoretical results show that representations can strictly improve estimation error compared to standard Neyman-orthogonal learners, while balancing constraints generally cannot replace orthogonality unless a strong inductive bias holds; invertibility can mitigate some issues. Empirically, OR-learners outperform baselines on synthetic, IHDP, ACIC 2016, and HC-MNIST data, offering practical guidelines for combining representation learning with Neyman-orthogonal learners and highlighting when balancing is beneficial. Overall, the work provides a principled, scalable approach to achieving both practical performance and theoretical guarantees in causal quantity estimation.

Abstract

End-to-end representation learning has become a powerful tool for estimating causal quantities from high-dimensional observational data, but its efficiency remained unclear. Here, we face a central tension: End-to-end representation learning methods often work well in practice but lack asymptotic optimality in the form of the quasi-oracle efficiency. In contrast, two-stage Neyman-orthogonal learners provide such a theoretical optimality property but do not explicitly benefit from the strengths of representation learning. In this work, we step back and ask two research questions: (1) When do representations strengthen existing Neyman-orthogonal learners? and (2) Can a balancing constraint - commonly proposed technique in the representation learning literature - provide improvements to Neyman-orthogonality? We address these two questions through our theoretical and empirical analysis, where we introduce a unifying framework that connects representation learning with Neyman-orthogonal learners (namely, OR-learners). In particular, we show that, under the low-dimensional manifold hypothesis, the OR-learners can strictly improve the estimation error of the standard Neyman-orthogonal learners. At the same time, we find that the balancing constraint requires an additional inductive bias and cannot generally compensate for the lack of Neyman-orthogonality of the end-to-end approaches. Building on these insights, we offer guidelines for how users can effectively combine representation learning with the classical Neyman-orthogonal learners to achieve both practical performance and theoretical guarantees.

Orthogonal Representation Learning for Estimating Causal Quantities

TL;DR

The paper addresses estimating heterogeneous causal quantities from high-dimensional observational data by reconciling end-to-end representation learning with Neyman-orthogonal meta-learners. It proposes OR-learners, a three-stage framework that learns representations, estimates nuisance functions, and then trains a Neyman-orthogonal target model using the learned representations, achieving quasi-oracle efficiency under a low-dimensional manifold assumption. Theoretical results show that representations can strictly improve estimation error compared to standard Neyman-orthogonal learners, while balancing constraints generally cannot replace orthogonality unless a strong inductive bias holds; invertibility can mitigate some issues. Empirically, OR-learners outperform baselines on synthetic, IHDP, ACIC 2016, and HC-MNIST data, offering practical guidelines for combining representation learning with Neyman-orthogonal learners and highlighting when balancing is beneficial. Overall, the work provides a principled, scalable approach to achieving both practical performance and theoretical guarantees in causal quantity estimation.

Abstract

End-to-end representation learning has become a powerful tool for estimating causal quantities from high-dimensional observational data, but its efficiency remained unclear. Here, we face a central tension: End-to-end representation learning methods often work well in practice but lack asymptotic optimality in the form of the quasi-oracle efficiency. In contrast, two-stage Neyman-orthogonal learners provide such a theoretical optimality property but do not explicitly benefit from the strengths of representation learning. In this work, we step back and ask two research questions: (1) When do representations strengthen existing Neyman-orthogonal learners? and (2) Can a balancing constraint - commonly proposed technique in the representation learning literature - provide improvements to Neyman-orthogonality? We address these two questions through our theoretical and empirical analysis, where we introduce a unifying framework that connects representation learning with Neyman-orthogonal learners (namely, OR-learners). In particular, we show that, under the low-dimensional manifold hypothesis, the OR-learners can strictly improve the estimation error of the standard Neyman-orthogonal learners. At the same time, we find that the balancing constraint requires an additional inductive bias and cannot generally compensate for the lack of Neyman-orthogonality of the end-to-end approaches. Building on these insights, we offer guidelines for how users can effectively combine representation learning with the classical Neyman-orthogonal learners to achieve both practical performance and theoretical guarantees.

Paper Structure

This paper contains 27 sections, 5 theorems, 43 equations, 9 figures, 12 tables, 1 algorithm.

Key Result

Lemma 1

Assume a non-parametricSimilar convergence rate can be shown for neural networks, see Sec. 5.2 in schulte2025adjustment. target model $g(v), v \in \mathcal{V} \subseteq \mathbb{R}^{d_v}$. Then, the error between $g^* = \mathop{\mathrm{arg\,min}}\limits_{g \in \mathcal{G}}\mathcal{L}_{\mathcal{G}}(g, where $C^v$ and $s^v$ are the Hölder smoothness constants and exponents of the $g^*$, respectively,

Figures (9)

  • Figure 1: Hidden layers of the representation network induce spaces where the regression task is simpler.
  • Figure 2: Insights for RQ . For both figures, we highlight in yellow boxes how the OR-learners (in red) can be beneficial in comparison with the end-to-end representation network (in blue). Specifically, we compare the generalization performances in terms of MSE / precision in estimating heterogeneous effect (PEHE) (lower is better), depending on the strength of balancing, $\alpha$. In both cases, we show the behavior in a finite-sample vs. asymptomatic regime ($n \to \infty$). The plots highlight the effectiveness of the OR-learners in the asymptotic regime, especially when too much balancing is applied.
  • Figure 3: Results for synthetic data in Setting . Reported: ratio between the performance of TARFlow (CFRFlow with $\alpha = 0$) and invertible representation networks with varying $\alpha$; mean $\pm$ SE over 15 runs. Lower is better. Here: $n_{\text{train}} = 500$, $d_{\hat{\phi}} = 2$.
  • Figure 4: Flow chart of consistency and Neyman-orthogonality for representation learning methods. The OR-learners fill the gaps shown by red dotted lines.
  • Figure 5: An overview of the OR-learners. The OR-learners proceed in three stages: fitting a representation network, estimation of the nuisance functions, and fitting a target network. For the stage , we also show different options for the target network input $V$. Depending on the choice of the input $V$, the second-stage model $g(V)$ obtains different interpretations: it either learns a new model from scratch or performs a calibration of the representation network outputs.
  • ...and 4 more figures

Theorems & Definitions (14)

  • Lemma 1: Quasi-oracle efficiency of a non-parametric model
  • Proposition 1
  • Proposition 2: Smoothness of the hidden layers
  • Proposition 3: Smoothing via expanding mapping
  • Proposition 4: Balancing via contracting mapping
  • Definition 1: Neyman-orthogonality foster2023orthogonalmorzywolek2023general
  • Definition 2: Double robustness
  • Definition 3: Quasi-oracle efficiency
  • Definition 4: Hölder smoothness
  • proof
  • ...and 4 more