Table of Contents
Fetching ...

Statistical inference of convex order by Wasserstein projection

Jakwang Kim, Young-Heon Kim, Yuanlong Ruan, Andrew Warren

TL;DR

The paper develops a statistically principled approach to testing convex order between high-dimensional distributions by leveraging Wasserstein projections onto convex-order cones. It establishes Lipschitz stability for backward and forward projections, derives convergence rates for empirical projection distances under log-Sobolev and bounded-support conditions, and provides finite-sample p-value bounds and consistency results for a principled test. A practical computation scheme based on an entropic Frank-Wolfe algorithm is proposed to evaluate the projection distance, with experiments on synthetic data illustrating effectiveness. The framework yields not only a decision rule but also a measure of deviation from convex order, enabling robust decision-making in areas such as arbitrage detection and risk assessment.

Abstract

Ranking distributions according to a stochastic order has wide applications in diverse areas. Although stochastic dominance has received much attention, convex order, particularly in general dimensions, has yet to be investigated from a statistical point of view. This article addresses this gap by introducing a simple statistical test for convex order based on the Wasserstein projection distance. This projection distance not only encodes whether two distributions are indeed in convex order, but also quantifies the deviation from the desired convex order and produces an optimal convex order approximation. Lipschitz stability of the backward and forward Wasserstein projection distance is proved, which leads to elegant consistency and concentration results of the estimator we employ as our test statistic. Combining these with state of the art results regarding the convergence rate of empirical distributions, we also derive upper bounds for the $p$-value and type I error of our test statistic, as well as upper bounds on the type II error for an appropriate class of strict alternatives. With proper choices of families of distributions, we further attain that the power of the proposed test increases to one as the number of samples grows to infinity. Lastly, we provide an efficient numerical scheme for our test statistic, by way of an entropic Frank-Wolfe algorithm. Experiments based on synthetic data sets illuminate the success of our approach.

Statistical inference of convex order by Wasserstein projection

TL;DR

The paper develops a statistically principled approach to testing convex order between high-dimensional distributions by leveraging Wasserstein projections onto convex-order cones. It establishes Lipschitz stability for backward and forward projections, derives convergence rates for empirical projection distances under log-Sobolev and bounded-support conditions, and provides finite-sample p-value bounds and consistency results for a principled test. A practical computation scheme based on an entropic Frank-Wolfe algorithm is proposed to evaluate the projection distance, with experiments on synthetic data illustrating effectiveness. The framework yields not only a decision rule but also a measure of deviation from convex order, enabling robust decision-making in areas such as arbitrage detection and risk assessment.

Abstract

Ranking distributions according to a stochastic order has wide applications in diverse areas. Although stochastic dominance has received much attention, convex order, particularly in general dimensions, has yet to be investigated from a statistical point of view. This article addresses this gap by introducing a simple statistical test for convex order based on the Wasserstein projection distance. This projection distance not only encodes whether two distributions are indeed in convex order, but also quantifies the deviation from the desired convex order and produces an optimal convex order approximation. Lipschitz stability of the backward and forward Wasserstein projection distance is proved, which leads to elegant consistency and concentration results of the estimator we employ as our test statistic. Combining these with state of the art results regarding the convergence rate of empirical distributions, we also derive upper bounds for the -value and type I error of our test statistic, as well as upper bounds on the type II error for an appropriate class of strict alternatives. With proper choices of families of distributions, we further attain that the power of the proposed test increases to one as the number of samples grows to infinity. Lastly, we provide an efficient numerical scheme for our test statistic, by way of an entropic Frank-Wolfe algorithm. Experiments based on synthetic data sets illuminate the success of our approach.
Paper Structure (22 sections, 15 theorems, 72 equations, 3 figures)

This paper contains 22 sections, 15 theorems, 72 equations, 3 figures.

Key Result

Theorem 3.1

yh_yl_stochastic_order Let $\mu, \nu \in \mathcal{P}_2 \left( \mathbb{R}^d \right)$.

Figures (3)

  • Figure 1: The plot of $\mathcal{W}_2\left( \mu_n, \mathscr{P}_{\preceq \nu_n}^{cx} \right)$.
  • Figure 2: The geometry of the Wasserstein projection.
  • Figure 3: Left-most: the initial distribution concentrating on the barycenter of $\nu_m.$ Right-most: the empirical distribution $\nu_m.$ Middle: slices of the gradient flow along the convex order cone $\mathscr{P}_{\preceq\nu_m}^{\text{cx}}$ (from left to right) generated by the entropic Frank-Wolfe algorithm, that is expected to converge to $\overline{\mu}_n$ the projection to $\mathscr{P}_{\preceq\nu_m}^{\text{cx}}$. Each square is $\left[ -9,9\right] \times \left[-9,9\right] .$ The shape of the distribution quickly becomes stable in a few iterations.

Theorems & Definitions (38)

  • Example
  • Theorem 3.1
  • Remark 3.2
  • Remark 3.3
  • Lemma 3.4
  • Remark 3.5
  • Theorem 3.6
  • Remark 3.7
  • Definition 3.8
  • Definition 3.9
  • ...and 28 more