Table of Contents
Fetching ...

Robust Statistical Comparison of Random Variables with Locally Varying Scale of Measurement

Christoph Jansen, Georg Schollmeyer, Hannah Blocher, Julian Rodemann, Thomas Augustin

TL;DR

The paper develops a robust framework for comparing random variables in spaces with locally varying scales via generalized stochastic dominance (GSD) defined through preference systems. It introduces regularized, permutation-based tests to infer GSD from samples and extends robustness to imprecise probabilities using credal sets, enabling valid inference under epistemic and approximation uncertainties. The theory is specialized to multidimensional spaces with mixed cardinal and ordinal dimensions, with concrete LP-based algorithms for computing test statistics and their robust counterparts. Applications in multidimensional poverty, finance, and medicine illustrate the method’s ability to leverage full information in complex data structures and to provide conservative conclusions under sampling and model misspecification. The work contributes a principled, computationally tractable approach to robust distributional comparison in non-standard measurement spaces, with practical implications for policy analysis and scientific inference.

Abstract

Spaces with locally varying scale of measurement, like multidimensional structures with differently scaled dimensions, are pretty common in statistics and machine learning. Nevertheless, it is still understood as an open question how to exploit the entire information encoded in them properly. We address this problem by considering an order based on (sets of) expectations of random variables mapping into such non-standard spaces. This order contains stochastic dominance and expectation order as extreme cases when no, or respectively perfect, cardinal structure is given. We derive a (regularized) statistical test for our proposed generalized stochastic dominance (GSD) order, operationalize it by linear optimization, and robustify it by imprecise probability models. Our findings are illustrated with data from multidimensional poverty measurement, finance, and medicine.

Robust Statistical Comparison of Random Variables with Locally Varying Scale of Measurement

TL;DR

The paper develops a robust framework for comparing random variables in spaces with locally varying scales via generalized stochastic dominance (GSD) defined through preference systems. It introduces regularized, permutation-based tests to infer GSD from samples and extends robustness to imprecise probabilities using credal sets, enabling valid inference under epistemic and approximation uncertainties. The theory is specialized to multidimensional spaces with mixed cardinal and ordinal dimensions, with concrete LP-based algorithms for computing test statistics and their robust counterparts. Applications in multidimensional poverty, finance, and medicine illustrate the method’s ability to leverage full information in complex data structures and to provide conservative conclusions under sampling and model misspecification. The work contributes a principled, computationally tractable approach to robust distributional comparison in non-standard measurement spaces, with practical implications for policy analysis and scientific inference.

Abstract

Spaces with locally varying scale of measurement, like multidimensional structures with differently scaled dimensions, are pretty common in statistics and machine learning. Nevertheless, it is still understood as an open question how to exploit the entire information encoded in them properly. We address this problem by considering an order based on (sets of) expectations of random variables mapping into such non-standard spaces. This order contains stochastic dominance and expectation order as extreme cases when no, or respectively perfect, cardinal structure is given. We derive a (regularized) statistical test for our proposed generalized stochastic dominance (GSD) order, operationalize it by linear optimization, and robustify it by imprecise probability models. Our findings are illustrated with data from multidimensional poverty measurement, finance, and medicine.
Paper Structure (16 sections, 9 theorems, 28 equations, 3 figures)

This paper contains 16 sections, 9 theorems, 28 equations, 3 figures.

Key Result

Proposition 1

Let $\mathcal{A}=[A, R_1 , R_2]$ be a bounded preference system. Then $\mathcal{A}$ is consistent if and only if it is $0$-consistent.

Figures (3)

  • Figure 1: Two ways for regularizing a preference system.
  • Figure 2: Distributions of ${d}^{\varepsilon}_{I}$ with $\varepsilon \in \{0,0.25,0.5,0.75,1\}$ obtained from $N = 1000$ resamples of ALLBUS data. Black stripes show exact positions of ${d}^{\varepsilon}_{I}$ values. Vertical black line marks median. Red line shows value of the respective observed test statistics ${d}^{\varepsilon}_{\mathbf X, \mathbf Y}(\omega)$.
  • Figure 3: P-values as function of the contamination $\gamma$ (see Supp. C) for tests with different regularization strength $\varepsilon$. Dotted red line marks significance level $\alpha = 0.05$.

Theorems & Definitions (14)

  • Definition 1
  • Definition 2
  • Definition 3
  • Proposition 1
  • Definition 4
  • Proposition 2
  • Proposition 3
  • Proposition 4
  • Proposition 5
  • Proposition 6
  • ...and 4 more