Table of Contents
Fetching ...

Risk Measures and Upper Probabilities: Coherence and Stratification

Christian Fröhlich, Robert C. Williamson

TL;DR

The paper argues for replacing the standard expectation with coherent risk measures to capture risk aversion and ambiguity in ML under distributional shift and fairness concerns. By embedding coherent risk measures in rearrangement invariant Banach spaces and leveraging Kusuoka representations, it characterizes spectral risk measures through fundamental functions, linking CVaR, distortion measures, and Choquet integrals. It shows that the Lorentz and Marcinkiewicz norms bound all law-invariant coherent risks with a given fundamental function, thereby providing a natural tail-focused stratification of risk measures. The authors develop interpolation tools to construct new risk measures from old ones, analyze tail behavior via φ′(0), and demonstrate empirically that spectral risk measures improve robustness and reduce inequality in losses, albeit with trade-offs in average performance. Overall, the work provides a principled, interpretable framework to quantify and combine risk and uncertainty in ML using spectral risk measures and ri-space theory, with practical implications for robustness and fairness.

Abstract

Machine learning typically presupposes classical probability theory which implies that aggregation is built upon expectation. There are now multiple reasons to motivate looking at richer alternatives to classical probability theory as a mathematical foundation for machine learning. We systematically examine a powerful and rich class of alternative aggregation functionals, known variously as spectral risk measures, Choquet integrals or Lorentz norms. We present a range of characterization results, and demonstrate what makes this spectral family so special. In doing so we arrive at a natural stratification of all coherent risk measures in terms of the upper probabilities that they induce by exploiting results from the theory of rearrangement invariant Banach spaces. We empirically demonstrate how this new approach to uncertainty helps tackling practical machine learning problems.

Risk Measures and Upper Probabilities: Coherence and Stratification

TL;DR

The paper argues for replacing the standard expectation with coherent risk measures to capture risk aversion and ambiguity in ML under distributional shift and fairness concerns. By embedding coherent risk measures in rearrangement invariant Banach spaces and leveraging Kusuoka representations, it characterizes spectral risk measures through fundamental functions, linking CVaR, distortion measures, and Choquet integrals. It shows that the Lorentz and Marcinkiewicz norms bound all law-invariant coherent risks with a given fundamental function, thereby providing a natural tail-focused stratification of risk measures. The authors develop interpolation tools to construct new risk measures from old ones, analyze tail behavior via φ′(0), and demonstrate empirically that spectral risk measures improve robustness and reduce inequality in losses, albeit with trade-offs in average performance. Overall, the work provides a principled, interpretable framework to quantify and combine risk and uncertainty in ML using spectral risk measures and ri-space theory, with practical implications for robustness and fairness.

Abstract

Machine learning typically presupposes classical probability theory which implies that aggregation is built upon expectation. There are now multiple reasons to motivate looking at richer alternatives to classical probability theory as a mathematical foundation for machine learning. We systematically examine a powerful and rich class of alternative aggregation functionals, known variously as spectral risk measures, Choquet integrals or Lorentz norms. We present a range of characterization results, and demonstrate what makes this spectral family so special. In doing so we arrive at a natural stratification of all coherent risk measures in terms of the upper probabilities that they induce by exploiting results from the theory of rearrangement invariant Banach spaces. We empirically demonstrate how this new approach to uncertainty helps tackling practical machine learning problems.
Paper Structure (73 sections, 46 theorems, 258 equations, 11 figures)

This paper contains 73 sections, 46 theorems, 258 equations, 11 figures.

Key Result

Theorem 2

pelessoni2003imprecise. Let $\mathcal{L}$ be a linear space of bounded real-valued random variables, containing all constants $c \in \mathbb{R}$. A functional $R$ is a coherent risk measure on $\mathcal{L}$ if and only if it is a coherent upper prevision on $\mathcal{L}$.

Figures (11)

  • Figure 1: The fundamental risk quadrangle rockafellar2013fundamental.
  • Figure 2: Top left: the density of an exemplary skew-normal distribution, belonging to some random variable $X$. Top right: lower and upper probabilities with distortion function $\phi(t)=1-(1-t)^2$. Bottom left: lower and upper distortion of the survival function, corresponding to the exemplary distribution. Bottom right: lower and upper densities, resulting from the distortion. The vertical lines indicate the expectation and the distortion risk. Note that $R_\phi(X)$ is substantially greater than $\mathbb{E}[X]$.
  • Figure 3: The red curve is the fundamental function $\phi(t)=1-(1-t)^2$. Left: the black lines correspond to five selected $\phi_t$ in the Marcinkiewicz norm construction. Right: the black lines correspond to five selected $\phi_t$ in the positive translation equivariant Marcinkiewicz norm construction. In this particular case, the latter yields the Dutch risk measure. Due to PTE, the $\phi_t$ need to reach $1$ at $t=1$. In both cases, the supremum over the (infinite) family of black lines recovers the red line, i.e. the fundamental function $\phi$.
  • Figure 4: Illustration of the interpolation between two quasiconcave fundamental functions. The graph shows $\phi_{\mathrm{red}}(t)=t^{1/4}$ (in red) and $\phi_{\mathrm{blue}}(t)= 3t\wedge 1$ (in blue). The grey curves are obtained via $\phi(t)=\breve{\phi}_a(\phi_{\mathrm{red}}(t),\phi_{\mathrm{blue}}(t))$ where $\breve{\phi}_a$ is the perspective of $\phi_a(t)=t^{1/a}$, with $a=\alpha^{1/4}$ and $\alpha$ ranges from 2 to 400 in steps of 10. Small values of $a$ result in $\phi$ being closer to $\phi_{\mathrm{red}}$ and larger values result in $\phi_a$ being closer to $\phi_{\mathrm{blue}}$. Observe that at the three points where $\phi_{\mathrm{red}}$ and $\phi_{\mathrm{blue}}$ agree, so too does $\phi_a$.
  • Figure 5: Left: $\operatorname{CVar}_\alpha$ curves for $500$ randomly drawn samples from a standard normal and a $t$-distribution with $2$ degrees of freedom, respectively. Only nonnegative samples were kept. Both have approximately the same mean $CVar_{\alpha=0}$, but the $t$-distribution has substantially more weight in the tails. Right: empirical Lorenz curves for the same samples. Here, the curve of the standard normal is closer to the diagonal. The diagonal corresponds to perfect equality. The t-distribution exhibits higher inequality as compared to the standard normal.
  • ...and 6 more figures

Theorems & Definitions (67)

  • Definition 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5
  • Definition 6
  • Definition 7
  • Definition 8
  • Theorem 9
  • Definition 10
  • ...and 57 more