Automated Efficient Estimation using Monte Carlo Efficient Influence Functions

Raj Agrawal; Sam Witty; Andy Zane; Eli Bingham

Automated Efficient Estimation using Monte Carlo Efficient Influence Functions

Raj Agrawal, Sam Witty, Andy Zane, Eli Bingham

TL;DR

It is proved that MC-EIF is consistent, and that estimators using MC-EIF achieve optimal $\sqrt{N}$ convergence rates, and it is shown empirically that estimators using MC-EIF are at parity with estimators using analytic EIFs.

Abstract

Many practical problems involve estimating low dimensional statistical quantities with high-dimensional models and datasets. Several approaches address these estimation tasks based on the theory of influence functions, such as debiased/double ML or targeted minimum loss estimation. This paper introduces \textit{Monte Carlo Efficient Influence Functions} (MC-EIF), a fully automated technique for approximating efficient influence functions that integrates seamlessly with existing differentiable probabilistic programming systems. MC-EIF automates efficient statistical estimation for a broad class of models and target functionals that would previously require rigorous custom analysis. We prove that MC-EIF is consistent, and that estimators using MC-EIF achieve optimal $\sqrt{N}$ convergence rates. We show empirically that estimators using MC-EIF are at parity with estimators using analytic EIFs. Finally, we demonstrate a novel capstone example using MC-EIF for optimal portfolio selection.

Automated Efficient Estimation using Monte Carlo Efficient Influence Functions

TL;DR

It is proved that MC-EIF is consistent, and that estimators using MC-EIF achieve optimal

convergence rates, and it is shown empirically that estimators using MC-EIF are at parity with estimators using analytic EIFs.

Abstract

convergence rates. We show empirically that estimators using MC-EIF are at parity with estimators using analytic EIFs. Finally, we demonstrate a novel capstone example using MC-EIF for optimal portfolio selection.

Paper Structure (33 sections, 5 theorems, 32 equations, 8 figures, 1 table, 3 algorithms)

This paper contains 33 sections, 5 theorems, 32 equations, 8 figures, 1 table, 3 algorithms.

Introduction
Related Work
Problem Statement
Overview of Semiparametric Statistics
Influence Functions
The Problem: Solving Integral Equations is Hard
Monte Carlo Efficient Influence Function
The EIF in Parametric Models
Numerically Approximating the EIF
Theoretical Guarantees for MC-EIF
An EIF Cookbook
MC-EIF for Automated Efficient Inference
Von Mises One Step Estimator
Debiased/Double ML
Targeted Minimum Loss Estimation
...and 18 more sections

Key Result

Theorem 3.4

(Theorem 3.5 in semi-theory-book) Suppose assum:cont_diff_prob, assum:cont_diff_func, and assum:fisher_inv hold. Then, the efficient influence function $\varphi_{\phi}(\tilde{x})$ at $\phi$ evaluated at the point $\tilde{x} \in \mathbb{R}^D$ equals

Figures (8)

Figure 1: Comparison between MC-EIF and empirical Gateaux approximation. MC-EIF (a and b) is less sensitive to hyperparameters parameters ($\epsilon$ and $\lambda$) than the empirical Gateaux baseline (c).
Figure 2: Empirical evidence for convergence theory. Increasing $p$ for the average treatment effect experiments produces MC-EIF approximation errors that closely match \ref{['thm:monte_eif_convg']}.
Figure 3: Comparison between ATE estimators using MC-EIF and analytic EIF. MC-EIF produces ATE estimates that are very close to the diagonal, representing an oracle estimator of the EIF.
Figure 4: We taxonomize the workflow of robust estimation into three stages: the derivation of an (approximate and/or efficient) influence function, the numerical derivation and analysis required for its computation, and the code required to compute it. For the analytic workflow, the derivation of the IF results in \ref{['eq:analytic_if_mpe']}. This largely involves terms already required by the original plug-in (\ref{['eq:mpe']}), but still must be implemented on a case-by-case basis in code. For the "Empirical Gateaux" workflow, the first stage requires only the general purpose \ref{['eq:emp_gat_if_approx']}, but demands case-specific numerical considerations and derivations like the one shown in \ref{['eq:emp_gat_mpe_numeric']}. In stark contrast, given a differentiable approximation to the functional of interest, $\textrm{MC-EIF}$ "automates" each stage through use of an end-to-end, general purpose solution.
Figure 5: Comparison of plug-in estimator and efficient estimators using MC-EIF and analytic EIF for estimating ATE. The true ATE is 0. Closer to zero the better. The distribution is over 100 simulated datasets. Dashed lines represent the estimates using the analytic EIF, and the solid lines represent using MC-EIF (when applicable).
...and 3 more figures

Theorems & Definitions (12)

Definition 2.1
Definition 2.2
Theorem 3.4
Theorem 3.8
Proposition 4.1
Proposition 4.3
proof
proof
Lemma 1.1
proof
...and 2 more

Automated Efficient Estimation using Monte Carlo Efficient Influence Functions

TL;DR

Abstract

Automated Efficient Estimation using Monte Carlo Efficient Influence Functions

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (12)