Table of Contents
Fetching ...

On Heterogeneity in Wasserstein Space

Kisung You

Abstract

Data represented by probability measures arise as empirical distributions, posterior distributions, and feature-based representations of complex objects. We study heterogeneity in a population of probability measures through the expected value of a chosen transform of the pairwise Wasserstein distance. The resulting estimator is unbiased and, under simple moment conditions on the population law, is strongly consistent, asymptotically normal, and equipped with a consistent standard error. This also yields a simple comparison of two populations and remains stable under plug-in approximation when the measures are estimated. The associated empirical eccentricities identify the observations that contribute most strongly to heterogeneity within a sample.

On Heterogeneity in Wasserstein Space

Abstract

Data represented by probability measures arise as empirical distributions, posterior distributions, and feature-based representations of complex objects. We study heterogeneity in a population of probability measures through the expected value of a chosen transform of the pairwise Wasserstein distance. The resulting estimator is unbiased and, under simple moment conditions on the population law, is strongly consistent, asymptotically normal, and equipped with a consistent standard error. This also yields a simple comparison of two populations and remains stable under plug-in approximation when the measures are estimated. The associated empirical eccentricities identify the observations that contribute most strongly to heterogeneity within a sample.
Paper Structure (7 sections, 5 theorems, 105 equations, 3 figures, 1 table)

This paper contains 7 sections, 5 theorems, 105 equations, 3 figures, 1 table.

Key Result

Lemma 2.1

Suppose that, for some $p\geq 1$ and constants $a,b\geq 0$, and fix $\mu_0\in\mathcal{P}_2(\mathbb{R}^d)$. Then there exists $C\geq 0$ such that, for all $\mu,\nu\in\mathcal{P}_2(\mathbb{R}^d)$, Consequently, if $E\{W_2(\mu,\mu_0)^p\}<\infty$, then $E|h(\mu,\nu)|<\infty$; if $E\{W_2(\mu,\mu_0)^{2p}\}<\infty$, then $E\{h(\mu,\nu)^2\}<\infty$. These moment conditions do not depend on the particula

Figures (3)

  • Figure 1: Synthetic Gaussian experiment. Panel (a) shows the locations of the generated measures; colours indicate the three groups A, B, and C. Panel (b) shows selected Gaussian measures from group C; grey ellipses denote typical observations and red ellipses denote observations with large empirical eccentricity. Panel (c) shows within-group heterogeneity estimates (points) with 95% Wald intervals (vertical bars); crosses indicate analytic reference values. Panel (d) shows empirical eccentricities $\widehat{g}_i$ for observations in group C; red points correspond to the atypical component and dark grey points to the main component.
  • Figure 2: Transform choice and plug-in stability in the synthetic experiment.
  • Figure 3: Most eccentric images for digits 5 (top row), 2 (middle row), and 7 (bottom row).

Theorems & Definitions (5)

  • Lemma 2.1: Moment domination
  • Theorem 2.2
  • Proposition 2.3: Variance estimation
  • Proposition 2.4: Two-sample comparison
  • Proposition 2.5: Plug-in stability