Table of Contents
Fetching ...

Lower Complexity Adaptation for Empirical Entropic Optimal Transport

Michel Groppe, Shayan Hundrieser

TL;DR

This work analyzes empirical plug-in estimation of Entropic OT costs and proves a lower-complexity adaptation principle: the estimation error depends only on the lower intrinsic dimension among the two population measures and achieves parametric rates in sample size $n$ with constants that scale with the penalty parameter $\varepsilon$ and the dimension. The authors develop a dual formulation over a single function class $\mathcal{F}_{c,\varepsilon}$ and prove a general bound linking covering numbers to the mean absolute deviation, enabling sharp rates across multiple cost structures (semi-discrete, Lipschitz, semi-concave, Hölder, and squared Euclidean) and even extending to sub-Gaussian measures. They also provide a primal perspective showing a projective decomposition when costs split, and demonstrate the practical relevance by applying the LCA bounds to entropic Gromov-Wasserstein, along with comprehensive simulations validating the theory and illustrating computational aspects via the Sinkhorn algorithm. The results collectively indicate that empirical EOT enjoys similar dimension-adaptive properties to unregularized OT, with concrete implications for large-scale statistical inference and downstream tasks relying on entropic OT quantities.

Abstract

Entropic optimal transport (EOT) presents an effective and computationally viable alternative to unregularized optimal transport (OT), offering diverse applications for large-scale data analysis. In this work, we derive novel statistical bounds for empirical plug-in estimators of the EOT cost and show that their statistical performance in the entropy regularization parameter $ε$ and the sample size $n$ only depends on the simpler of the two probability measures. For instance, under sufficiently smooth costs this yields the parametric rate $n^{-1/2}$ with factor $ε^{-d/2}$, where $d$ is the minimum dimension of the two population measures. This confirms that empirical EOT also adheres to the lower complexity adaptation principle, a hallmark feature only recently identified for unregularized OT. As a consequence of our theory, we show that the empirical entropic Gromov-Wasserstein distance and its unregularized version for measures on Euclidean spaces also obey this principle. Additionally, we comment on computational aspects and complement our findings with Monte Carlo simulations. Our techniques employ empirical process theory and rely on a dual formulation of EOT over a single function class. Crucial to our analysis is the observation that the entropic cost-transformation of a function class does not increase its uniform metric entropy by much.

Lower Complexity Adaptation for Empirical Entropic Optimal Transport

TL;DR

This work analyzes empirical plug-in estimation of Entropic OT costs and proves a lower-complexity adaptation principle: the estimation error depends only on the lower intrinsic dimension among the two population measures and achieves parametric rates in sample size with constants that scale with the penalty parameter and the dimension. The authors develop a dual formulation over a single function class and prove a general bound linking covering numbers to the mean absolute deviation, enabling sharp rates across multiple cost structures (semi-discrete, Lipschitz, semi-concave, Hölder, and squared Euclidean) and even extending to sub-Gaussian measures. They also provide a primal perspective showing a projective decomposition when costs split, and demonstrate the practical relevance by applying the LCA bounds to entropic Gromov-Wasserstein, along with comprehensive simulations validating the theory and illustrating computational aspects via the Sinkhorn algorithm. The results collectively indicate that empirical EOT enjoys similar dimension-adaptive properties to unregularized OT, with concrete implications for large-scale statistical inference and downstream tasks relying on entropic OT quantities.

Abstract

Entropic optimal transport (EOT) presents an effective and computationally viable alternative to unregularized optimal transport (OT), offering diverse applications for large-scale data analysis. In this work, we derive novel statistical bounds for empirical plug-in estimators of the EOT cost and show that their statistical performance in the entropy regularization parameter and the sample size only depends on the simpler of the two probability measures. For instance, under sufficiently smooth costs this yields the parametric rate with factor , where is the minimum dimension of the two population measures. This confirms that empirical EOT also adheres to the lower complexity adaptation principle, a hallmark feature only recently identified for unregularized OT. As a consequence of our theory, we show that the empirical entropic Gromov-Wasserstein distance and its unregularized version for measures on Euclidean spaces also obey this principle. Additionally, we comment on computational aspects and complement our findings with Monte Carlo simulations. Our techniques employ empirical process theory and rely on a dual formulation of EOT over a single function class. Crucial to our analysis is the observation that the entropic cost-transformation of a function class does not increase its uniform metric entropy by much.
Paper Structure (27 sections, 32 theorems, 198 equations, 4 figures)

This paper contains 27 sections, 32 theorems, 198 equations, 4 figures.

Key Result

Theorem 2.1

Let ass:eot_cost hold. Then, it holds for all probability measures $\mu \in \mathcal{P}(\mathcal{X})$ and $\nu \in \mathcal{P}(\mathcal{Y})$ that In particular, optimizers exist and a pair $(\phi, \psi) \in L^{\exp}_{\varepsilon}(\mu) \times L^{\exp}_{\varepsilon}(\nu)$ is a maximizer of the above if and only if and they can be chosen such that $\norminf{\phi}, \norminf{\psi} \leq 3 / 2$. Furthe

Figures (4)

  • Figure 1: Simulations of the mean absolute deviation $\Delta_n$ (solid) and the by $\sqrt{2/\pi}$ scaled asymptotic standard deviation of the fluctuations $\sqrt{n}[\mathop{\mathrm{T}}\nolimits_{c,\varepsilon}(\hat{\mu}_n, \hat{\nu}_n) - \mathop{\mathrm{T}}\nolimits_{c,\varepsilon}(\mu, \nu)]$ (dashed) in the cube setting with cost $\norm{}_2^2$.
  • Figure 2: Simulations of the mean absolute deviation $\Delta_n$ in the cube setting with cost $\norm{}_1$.
  • Figure 3: Simulations of the mean absolute deviation $\Delta_n$ in the semi-discrete setting.
  • Figure 4: Simulations of the mean absolute deviation $\Delta_n$ for the Sinkhorn divergence.

Theorems & Definitions (70)

  • Theorem 2.1: Marino2020
  • Remark 2.2: Canonical extension
  • Proposition 2.3: Duality
  • Lemma 2.4: Stability bound
  • Lemma 2.5
  • Theorem 2.6: General LCA
  • proof : Proof of \ref{['thm:eot_lca']}
  • Corollary 2.7: Comparison of rates
  • Remark 2.8: Unbounded costs
  • Remark 2.9: Comparison with complexity scales of Stromme2023
  • ...and 60 more