Statistical inference of convex order by Wasserstein projection
Jakwang Kim, Young-Heon Kim, Yuanlong Ruan, Andrew Warren
TL;DR
The paper develops a statistically principled approach to testing convex order between high-dimensional distributions by leveraging Wasserstein projections onto convex-order cones. It establishes Lipschitz stability for backward and forward projections, derives convergence rates for empirical projection distances under log-Sobolev and bounded-support conditions, and provides finite-sample p-value bounds and consistency results for a principled test. A practical computation scheme based on an entropic Frank-Wolfe algorithm is proposed to evaluate the projection distance, with experiments on synthetic data illustrating effectiveness. The framework yields not only a decision rule but also a measure of deviation from convex order, enabling robust decision-making in areas such as arbitrage detection and risk assessment.
Abstract
Ranking distributions according to a stochastic order has wide applications in diverse areas. Although stochastic dominance has received much attention, convex order, particularly in general dimensions, has yet to be investigated from a statistical point of view. This article addresses this gap by introducing a simple statistical test for convex order based on the Wasserstein projection distance. This projection distance not only encodes whether two distributions are indeed in convex order, but also quantifies the deviation from the desired convex order and produces an optimal convex order approximation. Lipschitz stability of the backward and forward Wasserstein projection distance is proved, which leads to elegant consistency and concentration results of the estimator we employ as our test statistic. Combining these with state of the art results regarding the convergence rate of empirical distributions, we also derive upper bounds for the $p$-value and type I error of our test statistic, as well as upper bounds on the type II error for an appropriate class of strict alternatives. With proper choices of families of distributions, we further attain that the power of the proposed test increases to one as the number of samples grows to infinity. Lastly, we provide an efficient numerical scheme for our test statistic, by way of an entropic Frank-Wolfe algorithm. Experiments based on synthetic data sets illuminate the success of our approach.
