Robust Statistical Comparison of Random Variables with Locally Varying Scale of Measurement
Christoph Jansen, Georg Schollmeyer, Hannah Blocher, Julian Rodemann, Thomas Augustin
TL;DR
The paper develops a robust framework for comparing random variables in spaces with locally varying scales via generalized stochastic dominance (GSD) defined through preference systems. It introduces regularized, permutation-based tests to infer GSD from samples and extends robustness to imprecise probabilities using credal sets, enabling valid inference under epistemic and approximation uncertainties. The theory is specialized to multidimensional spaces with mixed cardinal and ordinal dimensions, with concrete LP-based algorithms for computing test statistics and their robust counterparts. Applications in multidimensional poverty, finance, and medicine illustrate the method’s ability to leverage full information in complex data structures and to provide conservative conclusions under sampling and model misspecification. The work contributes a principled, computationally tractable approach to robust distributional comparison in non-standard measurement spaces, with practical implications for policy analysis and scientific inference.
Abstract
Spaces with locally varying scale of measurement, like multidimensional structures with differently scaled dimensions, are pretty common in statistics and machine learning. Nevertheless, it is still understood as an open question how to exploit the entire information encoded in them properly. We address this problem by considering an order based on (sets of) expectations of random variables mapping into such non-standard spaces. This order contains stochastic dominance and expectation order as extreme cases when no, or respectively perfect, cardinal structure is given. We derive a (regularized) statistical test for our proposed generalized stochastic dominance (GSD) order, operationalize it by linear optimization, and robustify it by imprecise probability models. Our findings are illustrated with data from multidimensional poverty measurement, finance, and medicine.
