Table of Contents
Fetching ...

Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking

Gabriel Rioux, Apoorva Nitsure, Mattia Rigotti, Kristjan Greenewald, Youssef Mroueh

TL;DR

This work develops a multivariate extension of First-Order Stochastic Dominance by formulating almost-dominance as an optimal transport problem with compatible, smooth costs. It introduces entropic regularization to address computational and statistical challenges, proves a central limit theorem and bootstrap consistency for the empirical violation ratio, and builds a hypothesis-testing framework (absolute and relative) suitable for benchmarking multi-metric models. The approach enables robust, dependency-aware ranking of models, demonstrated through synthetic experiments and Large Language Model benchmarking where cross-metric dependencies are crucial. By leveraging the Sinkhorn algorithm and the functional delta method, the method delivers scalable inference and principled decision-making in complex, high-dimensional stochastic ordering tasks.

Abstract

Stochastic dominance is an important concept in probability theory, econometrics and social choice theory for robustly modeling agents' preferences between random outcomes. While many works have been dedicated to the univariate case, little has been done in the multivariate scenario, wherein an agent has to decide between different multivariate outcomes. By exploiting a characterization of multivariate first stochastic dominance in terms of couplings, we introduce a statistic that assesses multivariate almost stochastic dominance under the framework of Optimal Transport with a smooth cost. Further, we introduce an entropic regularization of this statistic, and establish a central limit theorem (CLT) and consistency of the bootstrap procedure for the empirical statistic. Armed with this CLT, we propose a hypothesis testing framework as well as an efficient implementation using the Sinkhorn algorithm. We showcase our method in comparing and benchmarking Large Language Models that are evaluated on multiple metrics. Our multivariate stochastic dominance test allows us to capture the dependencies between the metrics in order to make an informed and statistically significant decision on the relative performance of the models.

Multivariate Stochastic Dominance via Optimal Transport and Applications to Models Benchmarking

TL;DR

This work develops a multivariate extension of First-Order Stochastic Dominance by formulating almost-dominance as an optimal transport problem with compatible, smooth costs. It introduces entropic regularization to address computational and statistical challenges, proves a central limit theorem and bootstrap consistency for the empirical violation ratio, and builds a hypothesis-testing framework (absolute and relative) suitable for benchmarking multi-metric models. The approach enables robust, dependency-aware ranking of models, demonstrated through synthetic experiments and Large Language Model benchmarking where cross-metric dependencies are crucial. By leveraging the Sinkhorn algorithm and the functional delta method, the method delivers scalable inference and principled decision-making in complex, high-dimensional stochastic ordering tasks.

Abstract

Stochastic dominance is an important concept in probability theory, econometrics and social choice theory for robustly modeling agents' preferences between random outcomes. While many works have been dedicated to the univariate case, little has been done in the multivariate scenario, wherein an agent has to decide between different multivariate outcomes. By exploiting a characterization of multivariate first stochastic dominance in terms of couplings, we introduce a statistic that assesses multivariate almost stochastic dominance under the framework of Optimal Transport with a smooth cost. Further, we introduce an entropic regularization of this statistic, and establish a central limit theorem (CLT) and consistency of the bootstrap procedure for the empirical statistic. Armed with this CLT, we propose a hypothesis testing framework as well as an efficient implementation using the Sinkhorn algorithm. We showcase our method in comparing and benchmarking Large Language Models that are evaluated on multiple metrics. Our multivariate stochastic dominance test allows us to capture the dependencies between the metrics in order to make an informed and statistically significant decision on the relative performance of the models.
Paper Structure (26 sections, 13 theorems, 73 equations, 2 figures, 1 algorithm)

This paper contains 26 sections, 13 theorems, 73 equations, 2 figures, 1 algorithm.

Key Result

Lemma 1

Letting $\mu$ (resp. $\nu$) denote the law of $X$ (resp. $Y$), $X\underset{\text{FSD}}{\succcurlyeq} Y$ if $\mathsf{OT}_c(\mu,\nu)=0$, where $c:\mathbb R^d\times \mathbb R^d\to\mathbb R_+$ is the cost function $c(x,y)=\mathbbm 1_{\{x\leq y\}}(x,y)$.

Figures (2)

  • Figure 1: Convergence of $\varepsilon_{\log,\lambda>0}$ towards $\varepsilon_{\text{hinge},0}$ in the synthetic dataset introduced in this section. Left panel: for a fixed parameter $\beta=0$ of the logistic cost, $\varepsilon_{\log,\lambda>0}$ converge towards $\varepsilon_{\text{hinge},0}$ as $\lambda$ is decreased toward $0$. Right panel: for a fixed entropic regularization parameter $\lambda=0.1$, $\varepsilon_{\log,\lambda}$ converges towards $\varepsilon_{\text{hinge},0}$ as the gain of the logistic cost $\beta$ increases. All simulations were generated for $d=5$, $\mu=0$, $\sigma^2=1.0$ and $N=100$. Points and error bars indicate average and standard deviation across 100 repetitions.
  • Figure 2: Mix Instruct Results: Comparison of Multivariate FSD to Reduction to univariate FSD with aggregation across the dimensions.

Theorems & Definitions (29)

  • Lemma 1
  • Definition 1: OT Costs Compatible with Multivariate FSD
  • Definition 2: Smooth Costs
  • Example 1: Examples of OT Costs
  • Theorem 1: Stability as $\lambda\downarrow 0$
  • Definition 3: Entropic Multivariate Almost FSD
  • Theorem 2: Limit distribution and bootstrapping
  • Remark 1: On \ref{['thm:limitDistribution']}
  • Lemma 2
  • proof
  • ...and 19 more