Criterion Collapse and Loss Distribution Control

Matthew J. Holland

Criterion Collapse and Loss Distribution Control

Matthew J. Holland

TL;DR

This work investigates criterion collapse, the phenomenon where optimizing one learning criterion implies optimality in another, extending beyond standard mean-based losses to a wide range of risk criteria. It develops a unified theoretical framework for Bernoulli (zero-one) losses and surrogates, showing that many monotone criteria (e.g., DRO, CVaR, tilted ERM) collapse to error-probability minimizers, while non-monotone criteria can avoid this. The authors introduce loss-restraining criteria and non-monotonic surrogates (e.g., Flooding, SoftAD) and demonstrate that such approaches can balance surrogate loss, accuracy, and model norm in empirical image-classification experiments. The results offer methodological guidance for designing learning objectives that align with diverse evaluation metrics and caution against over-optimizing monotone risk criteria in highly expressive models.

Abstract

In this work, we consider the notion of "criterion collapse," in which optimization of one metric implies optimality in another, with a particular focus on conditions for collapse into error probability minimizers under a wide variety of learning criteria, ranging from DRO and OCE risks (CVaR, tilted ERM) to non-monotonic criteria underlying recent ascent-descent algorithms explored in the literature (Flooding, SoftAD). We show how collapse in the context of losses with a Bernoulli distribution goes far beyond existing results for CVaR and DRO, then expand our scope to include surrogate losses, showing conditions where monotonic criteria such as tilted ERM cannot avoid collapse, whereas non-monotonic alternatives can.

Criterion Collapse and Loss Distribution Control

TL;DR

Abstract

Paper Structure (34 sections, 6 theorems, 83 equations, 5 figures, 1 table)

This paper contains 34 sections, 6 theorems, 83 equations, 5 figures, 1 table.

Introduction
Criterion Collapse
Preliminaries
Basic notation
Loss function and criterion mapping
Random error and criterion collapse
Which classes lead to collapse?
Expectation of fixed function
Quantiles
Distribution dependent functions
Relationship with Surrogate Losses
Disentangling the error and surrogate loss
Criteria that cannot avoid collapse
Criteria that can avoid collapse
Empirical Analysis
...and 19 more sections

Key Result

Theorem 1

For arbitrary random loss $\mathop{\mathrm{\mathsf{L}}}\nolimits \in \mathcal{L}$, denote the distributionally robust optimization (DRO) criterion by where the "uncertainty set" $\mathcal{P}$ is taken to be a ball centered at some pre-defined data distribution on $\mathcal{X} \times \mathcal{Y}$, with finite radius measured by a valid $f$-divergence. Under zero-one loss $\mathop{\mathrm{\mathsf{L

Figures (5)

Figure 1: In the left plot, we show the three possible data points that can arise in the example described in §\ref{['sec:surrogates_nolink']}. Points above the dashed silver line are assigned a label of $1$ by $h_{1}$ and $-1$ by $h_{2}$; signs are reversed for all points below this line. For the outlying point in the bottom right, we have set $a=2$ in this example. In the right plot, we illustrate setting $p > 1/2$ to ensure the optimality of $h_{1}$ and $h_{2}$ in distinct criteria diverges.
Figure 2: Key metrics of interest (vertical axis) over epochs (horizontal axis). Here "loss" refers to average surrogate loss, "acc" refers to accuracy, and "norm" refers to the model L2 norm. Loss and accuracy are given for both training (dotted lines) and test data (solid lines). Plots on the left are for CIFAR-100, and plots on the right are for SVHN.
Figure 3: Graphs of $g_{\tau}(\cdot)$ in (\ref{['eqn:nonmonotonic_variantile_helper']}) over the unit interval for varying choices of $\tau$.
Figure 4: Examples of valid choices of $\rho$ (left) and $\widetilde{\rho}$ (right) for use in defining OCE criteria (\ref{['eqn:defn_OCE']}) and loss-restraining criteria (\ref{['eqn:defn_Cinner']})--(\ref{['eqn:defn_Couter']}) respectively.
Figure 5: Results for CIFAR-10 and FashionMNIST; see the caption of Figure \ref{['fig:benchmarks_1']} for details.

Theorems & Definitions (20)

Theorem 1: DRO criterion; hu2018a (hu2018a, Thm. 1)
Theorem 2: CVaR criterion; zhai2021b (zhai2021b, Prop. 1)
Proposition 3: Collapse of left quantiles
Remark 4: Related case: right quantiles
Proposition 5: Collapse under monotonic dispersion
Remark 6: Special case: OCE criteria
Remark 7: Related case: Cressie-Read DRO
Remark 8: Related case: criteria based on Orlicz regret
Remark 9: Non-monotonic alternative: variantile
Proposition 10
...and 10 more

Criterion Collapse and Loss Distribution Control

TL;DR

Abstract

Criterion Collapse and Loss Distribution Control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (5)

Theorems & Definitions (20)