Criterion Collapse and Loss Distribution Control
Matthew J. Holland
TL;DR
This work investigates criterion collapse, the phenomenon where optimizing one learning criterion implies optimality in another, extending beyond standard mean-based losses to a wide range of risk criteria. It develops a unified theoretical framework for Bernoulli (zero-one) losses and surrogates, showing that many monotone criteria (e.g., DRO, CVaR, tilted ERM) collapse to error-probability minimizers, while non-monotone criteria can avoid this. The authors introduce loss-restraining criteria and non-monotonic surrogates (e.g., Flooding, SoftAD) and demonstrate that such approaches can balance surrogate loss, accuracy, and model norm in empirical image-classification experiments. The results offer methodological guidance for designing learning objectives that align with diverse evaluation metrics and caution against over-optimizing monotone risk criteria in highly expressive models.
Abstract
In this work, we consider the notion of "criterion collapse," in which optimization of one metric implies optimality in another, with a particular focus on conditions for collapse into error probability minimizers under a wide variety of learning criteria, ranging from DRO and OCE risks (CVaR, tilted ERM) to non-monotonic criteria underlying recent ascent-descent algorithms explored in the literature (Flooding, SoftAD). We show how collapse in the context of losses with a Bernoulli distribution goes far beyond existing results for CVaR and DRO, then expand our scope to include surrogate losses, showing conditions where monotonic criteria such as tilted ERM cannot avoid collapse, whereas non-monotonic alternatives can.
