Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates

Thatchanon Anancharoenkij; Donlapark Ponnoprat

Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates

Thatchanon Anancharoenkij, Donlapark Ponnoprat

Abstract

A complete understanding of heterogeneous treatment effects involves characterizing the full conditional distribution of potential outcomes. To this end, we propose the Conditional Counterfactual Mean Embeddings (CCME), a framework that embeds conditional distributions of counterfactual outcomes into a reproducing kernel Hilbert space (RKHS). Under this framework, we develop a two-stage meta-estimator for CCME that accommodates any RKHS-valued regression in each stage. Based on this meta-estimator, we develop three practical CCME estimators: (1) Ridge Regression estimator, (2) Deep Feature estimator that parameterizes the feature map by a neural network, and (3) Neural-Kernel estimator that performs RKHS-valued regression, with the coefficients parameterized by a neural network. We provide finite-sample convergence rates for all estimators, establishing that they possess the double robustness property. Our experiments demonstrate that our estimators accurately recover distributional features including multimodal structure of conditional counterfactual distributions.

Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates

Abstract

Paper Structure (70 sections, 21 theorems, 202 equations, 3 figures, 2 tables, 4 algorithms)

This paper contains 70 sections, 21 theorems, 202 equations, 3 figures, 2 tables, 4 algorithms.

Introduction
Related Work
Background and Notations
Problem Setup
Reproducing Kernel Hilbert Spaces
The Target Estimand
Doubly Robust Identification of CCME
The Meta-Estimator
Three Practical Estimators
Ridge Regression Estimator
Deep Feature Estimator
Neural-Kernel Estimator
Theoretical Analysis
Assumptions
Convergence Rates
...and 55 more sections

Key Result

Proposition 1

Under Assumptions ass:unconfound and ass:boundk, the CCME satisfies the following identification result:

Figures (3)

Figure 1: MSE (log scale) vs. sample size for three methods (columns) under three misspecification scenarios (rows). Shaded regions indicate standard errors over 10 seeds.
Figure 2: Estimated counterfactual densities for profiles $v_1$ (top) and $v_2$ (bottom) over 30 runs at $n=20000$. Blue: DR; Red: One-Step. Solid lines are medians; dark and light regions show pointwise 50th and 90th percentile intervals.
Figure 3: Empirical mode for each class under three density estimates. Row 1: Oracle (clean data). Row 2: DR (proposed). Row 3: One-Step (prior work). DR recovers canonical digits matching the Oracle, while One-Step shows a subtle bias toward higher-intensity variants.

Theorems & Definitions (46)

Proposition 1
Remark 1
Theorem 1: Meta-Estimator Rate
Remark 2
Theorem 2: Ridge Regression Rate
Theorem 3: Deep Feature Estimator Rate
Theorem 4: Neural-Kernel Estimator Rate
Remark 3
Definition 1: Lebesgue Space $L^{p}(\Omega)$
Definition 2: Weak Derivatives
...and 36 more

Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates

Abstract

Conditional Counterfactual Mean Embeddings: Doubly Robust Estimation and Learning Rates

Authors

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (46)