Table of Contents
Fetching ...

Wasserstein-Cramér-Rao Theory of Unbiased Estimation

Nicolás García Trillos, Adam Quinn Jaffe, Bodhisattva Sen

TL;DR

This work introduces a Wasserstein-based theory of unbiased estimation, replacing variance-based Cramér-Rao bounds with a Wasserstein-Cramér-Rao bound that governs estimator sensitivity under infinitesimal transport perturbations. By defining differentiability in the Wasserstein sense and a corresponding Wasserstein information matrix, the authors establish fundamental lower bounds and identify models (transport families) in which exact sensitivity-efficiency is possible, analogous to exponential families in classical theory. They further develop the Wasserstein projection estimator (WPE) and prove it is asymptotically sensitivity-efficient, providing a practical, general-purpose method to approach the Wasserstein-CRB in high dimensions, with detailed analyses in location, scale, Pareto, and Gaussian settings. Together, these results offer a geometric, OT-based framework for stability in estimation, with potential implications for robust statistics, privacy, and distributionally robust optimization.

Abstract

The quantity of interest in the classical Cramér-Rao theory of unbiased estimation (e.g., the Cramér-Rao lower bound, its exact attainment for exponential families, and asymptotic efficiency of maximum likelihood estimation) is the variance, which represents the instability of an estimator when its value is compared to the value for an independently-sampled data set from the same distribution. In this paper we are interested in a quantity which represents the instability of an estimator when its value is compared to the value for an infinitesimal additive perturbation of the original data set; we refer to this as the "sensitivity" of an estimator. The resulting theory of sensitivity is based on the Wasserstein geometry in the same way that the classical theory of variance is based on the Fisher-Rao (equivalently, Hellinger) geometry, and this insight allows us to determine a collection of results which are analogous to the classical case: a Wasserstein-Cramér-Rao lower bound for the sensitivity of any unbiased estimator, a characterization of models in which there exist unbiased estimators achieving the lower bound exactly, and some concrete results that show that the Wasserstein projection estimator achieves the lower bound asymptotically. We use these results to treat many statistical examples, sometimes revealing new optimality properties for existing estimators and other times revealing entirely new estimators.

Wasserstein-Cramér-Rao Theory of Unbiased Estimation

TL;DR

This work introduces a Wasserstein-based theory of unbiased estimation, replacing variance-based Cramér-Rao bounds with a Wasserstein-Cramér-Rao bound that governs estimator sensitivity under infinitesimal transport perturbations. By defining differentiability in the Wasserstein sense and a corresponding Wasserstein information matrix, the authors establish fundamental lower bounds and identify models (transport families) in which exact sensitivity-efficiency is possible, analogous to exponential families in classical theory. They further develop the Wasserstein projection estimator (WPE) and prove it is asymptotically sensitivity-efficient, providing a practical, general-purpose method to approach the Wasserstein-CRB in high dimensions, with detailed analyses in location, scale, Pareto, and Gaussian settings. Together, these results offer a geometric, OT-based framework for stability in estimation, with potential implications for robust statistics, privacy, and distributionally robust optimization.

Abstract

The quantity of interest in the classical Cramér-Rao theory of unbiased estimation (e.g., the Cramér-Rao lower bound, its exact attainment for exponential families, and asymptotic efficiency of maximum likelihood estimation) is the variance, which represents the instability of an estimator when its value is compared to the value for an independently-sampled data set from the same distribution. In this paper we are interested in a quantity which represents the instability of an estimator when its value is compared to the value for an infinitesimal additive perturbation of the original data set; we refer to this as the "sensitivity" of an estimator. The resulting theory of sensitivity is based on the Wasserstein geometry in the same way that the classical theory of variance is based on the Fisher-Rao (equivalently, Hellinger) geometry, and this insight allows us to determine a collection of results which are analogous to the classical case: a Wasserstein-Cramér-Rao lower bound for the sensitivity of any unbiased estimator, a characterization of models in which there exist unbiased estimators achieving the lower bound exactly, and some concrete results that show that the Wasserstein projection estimator achieves the lower bound asymptotically. We use these results to treat many statistical examples, sometimes revealing new optimality properties for existing estimators and other times revealing entirely new estimators.

Paper Structure

This paper contains 25 sections, 18 theorems, 198 equations, 4 figures, 2 tables.

Key Result

Proposition 2.2

Suppose that $(\{\mu_t\}_{0\le t\le 1}, \{v_t\}_{0\le t\le 1})$ is a solution to the continuity equation such that we have $\mu_t\in\mathcal{P}_{2,\mathrm{ac}}(\mathbb{R}^m)$ for all $0\le t\le 1$ as well as Additionally, suppose that $f:\mathbb{R}^m\to\mathbb{R}$ is a locally Lipschitz function satisfying Then, the function $t \mapsto \int_{\mathbb{R}^m} f(x) \,\textnormal{d} \mu_t(x)$ is absol

Figures (4)

  • Figure 1: Computing the variance and sensitivity for three estimators in the uniform scale family of Example \ref{['ex:Unif-scale']}: the best linear estimator (BLE), the maximum likelihood estimator (MLE), and the Wasserstein projection estimator (WPE). The MLE has variance of order $n^{-2}$ and sensitivity of constant order. The BLE has both variance and stability of order $n^{-1}$. The WPE has both variance and stability of order $n^{-1}$, and it achieves a smaller constant prefactor than the BLE for both quantities. See Example \ref{['ex:unif-scale-fam']} for further details.
  • Figure 2: A probability measure and its infinitesimal flow in the Fisher-Rao and Wasserstein geometries. For a probability measure $\mu_t$, the Fisher-Rao flow is determined by a mean-zero scalar field $\xi_t:\mathbb{R}^m\to\mathbb{R}$ (top left) and the Wasserstein flow is determined by a suitable vector field $v_t:\mathbb{R}^m\to\mathbb{R}^m$ (bottom left). In either case, the scalar and vector fields are the tangent vectors for the path $\{\mu_t\}_{t}$ in the space of probability measures (right).
  • Figure 3: A statistical model that is differentiable in the Wasserstein sense (DWS). The transport linearization $\Phi_{\theta}$ at $\theta\in\Theta$ is a linear operator that transforms increments $h$ at $\theta \in \Theta$ (left) into tangent vectors $\Phi_{\theta}h$ at $P_\theta$ for the corresponding path in the model $\mathcal{P}=\{P_{\theta}:\theta\in\Theta\}$ (right). The Wasserstein information matrix $J(\theta)$ gives rise to an inner product on the space of increments at $\theta$ (which is isomorphic to $\mathbb{R}^p$), where the inner product between $h_1,h_2$ at $\theta\in\Theta$ (left) is set to be the $L^2_{P_\theta}(\mathbb{R}^d;\mathbb{R}^d)$ inner product between $\Phi_{\theta}h_1$ and $\Phi_{\theta}h_2$ (right).
  • Figure 4: The Wasserstein projection estimator (WPE) in a statistical model. For the sake of simplicity, we write $\hat{\theta}_n$ in place of $T_n^{\mathop{\mathrm{WPE}}\nolimits}$. The first-order optimality conditions for the WPE state that the "residual" from $P_{\hat{\theta}_n}$ to $\bar{P}_n$ is orthogonal to $\Phi_{\hat{\theta}_n}h$ for all $h\in \mathbb{R}^p$.

Theorems & Definitions (85)

  • Example 1.1: Gaussian location family
  • Example 1.1: Gaussian location family
  • Example 1.2: uniform scale family
  • Example 1.3: Laplace location family
  • Remark 2.1: Hellinger vs. Fisher-Rao geometries
  • Proposition 2.2
  • Remark 2.3: differentiability Lebesgue almost everywhere
  • Remark 2.4: absolute continuity assumption
  • Definition 3.1
  • Remark 3.2: absolute continuity of $P$, and definition of sensitivity
  • ...and 75 more