Table of Contents
Fetching ...

Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

Liyuan Xu, Heishiro Kanagawa, Arthur Gretton

TL;DR

This work tackles causal effect estimation under unobserved confounding by leveraging proxy variables and the proxy causal learning (PCL) framework, which recovers the structural function $f_{struct}(a)=\mathbb{E}_U[\mathbb{E}[Y|A=a,U]]$ via a bridge function $h^*$ and a two-stage regression. It introduces Deep Feature Proxy Variable (DFPV), which uses neural networks to learn adaptive feature maps for high-dimensional, nonlinear proxies within the two-stage PCL setup, and provides consistency guarantees under a Rademacher-complexity framework. The method is extended to off-policy evaluation in confounded bandits, enabling estimation of policy values $v(\pi)$ through $v(\pi)=\mathbb{E}_C[\mathbb{E}_{W|C}[h^*(\pi(C),W)]]$, with theoretical error bounds. Empirically, DFPV outperforms state-of-the-art PCL methods on synthetic benchmarks with image data and shows competitive performance for offline policy evaluation, illustrating the practical impact of learning deep, adaptive proxy representations for causal inference in high-dimensional settings.

Abstract

Proxy causal learning (PCL) is a method for estimating the causal effect of treatments on outcomes in the presence of unobserved confounding, using proxies (structured side information) for the confounder. This is achieved via two-stage regression: in the first stage, we model relations among the treatment and proxies; in the second stage, we use this model to learn the effect of treatment on the outcome, given the context provided by the proxies. PCL guarantees recovery of the true causal effect, subject to identifiability conditions. We propose a novel method for PCL, the deep feature proxy variable method (DFPV), to address the case where the proxies, treatments, and outcomes are high-dimensional and have nonlinear complex relationships, as represented by deep neural network features. We show that DFPV outperforms recent state-of-the-art PCL methods on challenging synthetic benchmarks, including settings involving high dimensional image data. Furthermore, we show that PCL can be applied to off-policy evaluation for the confounded bandit problem, in which DFPV also exhibits competitive performance.

Deep Proxy Causal Learning and its Application to Confounded Bandit Policy Evaluation

TL;DR

This work tackles causal effect estimation under unobserved confounding by leveraging proxy variables and the proxy causal learning (PCL) framework, which recovers the structural function via a bridge function and a two-stage regression. It introduces Deep Feature Proxy Variable (DFPV), which uses neural networks to learn adaptive feature maps for high-dimensional, nonlinear proxies within the two-stage PCL setup, and provides consistency guarantees under a Rademacher-complexity framework. The method is extended to off-policy evaluation in confounded bandits, enabling estimation of policy values through , with theoretical error bounds. Empirically, DFPV outperforms state-of-the-art PCL methods on synthetic benchmarks with image data and shows competitive performance for offline policy evaluation, illustrating the practical impact of learning deep, adaptive proxy representations for causal inference in high-dimensional settings.

Abstract

Proxy causal learning (PCL) is a method for estimating the causal effect of treatments on outcomes in the presence of unobserved confounding, using proxies (structured side information) for the confounder. This is achieved via two-stage regression: in the first stage, we model relations among the treatment and proxies; in the second stage, we use this model to learn the effect of treatment on the outcome, given the context provided by the proxies. PCL guarantees recovery of the true causal effect, subject to identifiability conditions. We propose a novel method for PCL, the deep feature proxy variable method (DFPV), to address the case where the proxies, treatments, and outcomes are high-dimensional and have nonlinear complex relationships, as represented by deep neural network features. We show that DFPV outperforms recent state-of-the-art PCL methods on challenging synthetic benchmarks, including settings involving high dimensional image data. Furthermore, we show that PCL can be applied to off-policy evaluation for the confounded bandit problem, in which DFPV also exhibits competitive performance.

Paper Structure

This paper contains 33 sections, 18 theorems, 125 equations, 8 figures, 5 tables, 2 algorithms.

Key Result

Proposition 1

Let Assumptions assum:stuctural, assum:completeness-confounder and Assumptions assu:cond-exp-compactness, assu:cond-exp-L2, assu:Y-square-integrable in Appendix sec:identifiability hold. Then there exists at least one solution to the functional equation which holds for any $(a,z) \in \mathcal{A} \times \mathcal{Z}$. Here, we denote $\rho_W(w|A=a, Z=z)$ as the density function of the conditional p

Figures (8)

  • Figure 1: Causal Graph.
  • Figure 2: Causal Graph in CEVAE
  • Figure 3: Result of structural function experiment in demand design setting (Left) and dSprite setting (Right).
  • Figure 4: Result of OPE experiment when the policy depends on the costs (Left) and on the current price (Right).
  • Figure 5: Causal graph with observable confounder
  • ...and 3 more figures

Theorems & Definitions (32)

  • Proposition 1: Miao2018Identifying
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5: Kress1999linear
  • Remark 1
  • Lemma 1
  • proof
  • Remark 2
  • Lemma 2
  • ...and 22 more