Table of Contents
Fetching ...

Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings

Houssam Zenati, Bariscan Bozkurt, Arthur Gretton

TL;DR

A novel framework is proposed that represents the entire counterfactual outcome distribution in a reproducing kernel Hilbert space (RKHS), enabling flexible and nonparametric distributional off-policy evaluation and develops a doubly robust kernel test statistic for hypothesis testing, which achieves asymptotic normality and thus enables computationally efficient testing and straightforward construction of confidence intervals.

Abstract

Estimating the distribution of outcomes under counterfactual policies is critical for decision-making in domains such as recommendation, advertising, and healthcare. We propose and analyze a novel framework-Counterfactual Policy Mean Embedding (CPME)-that represents the entire counterfactual outcome distribution in a reproducing kernel Hilbert space (RKHS), enabling flexible and nonparametric distributional off-policy evaluation. We introduce both a plug-in estimator and a doubly robust estimator; the latter enjoys improved convergence rates by correcting for bias in both the outcome embedding and propensity models. Building on this, we develop a doubly robust kernel test statistic for hypothesis testing, which achieves asymptotic normality and thus enables computationally efficient testing and straightforward construction of confidence intervals. Our framework also supports sampling from the counterfactual distribution. Numerical simulations illustrate the practical benefits of CPME over existing methods.

Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings

TL;DR

A novel framework is proposed that represents the entire counterfactual outcome distribution in a reproducing kernel Hilbert space (RKHS), enabling flexible and nonparametric distributional off-policy evaluation and develops a doubly robust kernel test statistic for hypothesis testing, which achieves asymptotic normality and thus enables computationally efficient testing and straightforward construction of confidence intervals.

Abstract

Estimating the distribution of outcomes under counterfactual policies is critical for decision-making in domains such as recommendation, advertising, and healthcare. We propose and analyze a novel framework-Counterfactual Policy Mean Embedding (CPME)-that represents the entire counterfactual outcome distribution in a reproducing kernel Hilbert space (RKHS), enabling flexible and nonparametric distributional off-policy evaluation. We introduce both a plug-in estimator and a doubly robust estimator; the latter enjoys improved convergence rates by correcting for bias in both the outcome embedding and propensity models. Building on this, we develop a doubly robust kernel test statistic for hypothesis testing, which achieves asymptotic normality and thus enables computationally efficient testing and straightforward construction of confidence intervals. Our framework also supports sampling from the counterfactual distribution. Numerical simulations illustrate the practical benefits of CPME over existing methods.

Paper Structure

This paper contains 88 sections, 19 theorems, 180 equations, 10 figures, 10 tables, 4 algorithms.

Key Result

Proposition 2

(Identified Counterfactual Policy Mean Embedding) Let us assume that Assumption assum:selection_observables holds, then the counterfactual policy mean embedding can be written as:

Figures (10)

  • Figure 1: Illustration of 100 simulations of DR-KPT under the null: (A) Histogram with standard normal pdf for $n=400$, (B) Normal Q–Q plot for $n=400$, (C) False positive rate across sample sizes. The results confirm the Gaussian behavior and good calibration of the test under the null.
  • Figure 2: True positive rates of $100$ simulations of the tests in Scenarios II, III, and IV. DR-KPT shows notable true positive rates in every scenario, unlike competitors.
  • Figure 3: Logistic logging policy, nonlinear outcome function.
  • Figure 4: Illustration of $100$ simulations of the non-sample-splitted DR-KPT under the null: (A) Histogram of DR-KPT alongside the pdf of a standard normal for $n = 400$, (B) Normal Q-Q plot of DR-KPT for $n = 400$, (C) False positive rate of DR-KPT against different sample sizes.
  • Figure 6: Mean squared error results for the off-policy evaluation experiment described in Appendix \ref{['sec:OPE_simulated_setting_appendix']}, reported across variations in: (a) the number of observations $n$, (b) the number of recommendations $K$, (c) the number of users $N$, (d) the context dimension $d$, and (e) the policy shift multiplier $\alpha$.
  • ...and 5 more figures

Theorems & Definitions (38)

  • Proposition 2
  • Proposition 4
  • Theorem 5
  • Lemma 4.1
  • Theorem 6
  • Theorem 7
  • Proposition 9
  • Example 9.1
  • Remark 10
  • Theorem 12
  • ...and 28 more