Table of Contents
Fetching ...

Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates

Thatchanon Anancharoenkij, Donlapark Ponnoprat

Abstract

A complete understanding of heterogeneous treatment effects involves characterizing the full conditional distribution of potential outcomes. To this end, we propose the Conditional Counterfactual Mean Embeddings (CCME), a framework that embeds conditional distributions of counterfactual outcomes into a reproducing kernel Hilbert space (RKHS). Under this framework, we develop a two-stage meta-estimator for CCME that accommodates any RKHS-valued regression in each stage. Based on this meta-estimator, we develop three practical CCME estimators: (1) Ridge Regression estimator, (2) Deep Feature estimator that parameterizes the feature map by a neural network, and (3) Neural-Kernel estimator that performs RKHS-valued regression, with the coefficients parameterized by a neural network. We provide finite-sample convergence rates for all estimators, establishing that they possess the double robustness property. Our experiments demonstrate that our estimators accurately recover distributional features including multimodal structure of conditional counterfactual distributions.

Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates

Abstract

A complete understanding of heterogeneous treatment effects involves characterizing the full conditional distribution of potential outcomes. To this end, we propose the Conditional Counterfactual Mean Embeddings (CCME), a framework that embeds conditional distributions of counterfactual outcomes into a reproducing kernel Hilbert space (RKHS). Under this framework, we develop a two-stage meta-estimator for CCME that accommodates any RKHS-valued regression in each stage. Based on this meta-estimator, we develop three practical CCME estimators: (1) Ridge Regression estimator, (2) Deep Feature estimator that parameterizes the feature map by a neural network, and (3) Neural-Kernel estimator that performs RKHS-valued regression, with the coefficients parameterized by a neural network. We provide finite-sample convergence rates for all estimators, establishing that they possess the double robustness property. Our experiments demonstrate that our estimators accurately recover distributional features including multimodal structure of conditional counterfactual distributions.
Paper Structure (70 sections, 21 theorems, 202 equations, 3 figures, 2 tables, 4 algorithms)

This paper contains 70 sections, 21 theorems, 202 equations, 3 figures, 2 tables, 4 algorithms.

Key Result

Proposition 1

Under Assumptions ass:unconfound and ass:boundk, the CCME satisfies the following identification result:

Figures (3)

  • Figure 1: MSE (log scale) vs. sample size for three methods (columns) under three misspecification scenarios (rows). Shaded regions indicate standard errors over 10 seeds.
  • Figure 2: Estimated counterfactual densities for profiles $v_1$ (top) and $v_2$ (bottom) over 30 runs at $n=20000$. Blue: DR; Red: One-Step. Solid lines are medians; dark and light regions show pointwise 50th and 90th percentile intervals.
  • Figure 3: Empirical mode for each class under three density estimates. Row 1: Oracle (clean data). Row 2: DR (proposed). Row 3: One-Step (prior work). DR recovers canonical digits matching the Oracle, while One-Step shows a subtle bias toward higher-intensity variants.

Theorems & Definitions (46)

  • Proposition 1
  • Remark 1
  • Theorem 1: Meta-Estimator Rate
  • Remark 2
  • Theorem 2: Ridge Regression Rate
  • Theorem 3: Deep Feature Estimator Rate
  • Theorem 4: Neural-Kernel Estimator Rate
  • Remark 3
  • Definition 1: Lebesgue Space $L^{p}(\Omega)$
  • Definition 2: Weak Derivatives
  • ...and 36 more