Proper losses regret at least 1/2-order

Han Bao; Asuka Takatsu

Proper losses regret at least 1/2-order

Han Bao, Asuka Takatsu

TL;DR

The paper investigates surrogate regrets arising from multiclass proper losses and establishes that strict propriety is necessary and sufficient for non-vacuous p-norm regret bounds. It extends surrogate-regret analysis beyond binary classification by introducing moduli of convexity on the probability simplex and proving the optimal $p$-norm convergence rate of $O( ho^{1/2})$ for a broad class of strictly proper losses, with strongly proper losses attaining this rate. The authors connect surrogate regret to downstream tasks (multiclass classification, learning with noisy labels, and bipartite ranking), providing concrete $p$-norm bounds that translate into performance guarantees for plug-in forecasters. They also develop a Simonenko-order-based lower bound framework, showing that no proper loss can asymptotically beat the $O( ho^{1/2})$ rate under mild conditions, and illustrate the theory with detailed examples, including Shannon entropy. Overall, the work clarifies fundamental limits and optimality of loss choices for probability forecasting in multiclass tasks and informs loss selection for practical learning pipelines.

Abstract

A fundamental challenge in machine learning is the choice of a loss as it characterizes our learning task, is minimized in the training phase, and serves as an evaluation criterion for estimators. Proper losses are commonly chosen, ensuring minimizers of the full risk match the true probability vector. Estimators induced from a proper loss are widely used to construct forecasters for downstream tasks such as classification and ranking. In this procedure, how does the forecaster based on the obtained estimator perform well under a given downstream task? This question is substantially relevant to the behavior of the $p$-norm between the estimated and true probability vectors when the estimator is updated. In the proper loss framework, the suboptimality of the estimated probability vector from the true probability vector is measured by a surrogate regret. First, we analyze a surrogate regret and show that the strict properness of a loss is necessary and sufficient to establish a non-vacuous surrogate regret bound. Second, we solve an important open question that the order of convergence in p-norm cannot be faster than the $1/2$-order of surrogate regrets for a broad class of strictly proper losses. This implies that strongly proper losses entail the optimal convergence rate.

Proper losses regret at least 1/2-order

TL;DR

-norm convergence rate of

for a broad class of strictly proper losses, with strongly proper losses attaining this rate. The authors connect surrogate regret to downstream tasks (multiclass classification, learning with noisy labels, and bipartite ranking), providing concrete

-norm bounds that translate into performance guarantees for plug-in forecasters. They also develop a Simonenko-order-based lower bound framework, showing that no proper loss can asymptotically beat the

rate under mild conditions, and illustrate the theory with detailed examples, including Shannon entropy. Overall, the work clarifies fundamental limits and optimality of loss choices for probability forecasting in multiclass tasks and informs loss selection for practical learning pipelines.

Abstract

-norm between the estimated and true probability vectors when the estimator is updated. In the proper loss framework, the suboptimality of the estimated probability vector from the true probability vector is measured by a surrogate regret. First, we analyze a surrogate regret and show that the strict properness of a loss is necessary and sufficient to establish a non-vacuous surrogate regret bound. Second, we solve an important open question that the order of convergence in p-norm cannot be faster than the

-order of surrogate regrets for a broad class of strictly proper losses. This implies that strongly proper losses entail the optimal convergence rate.

Paper Structure (27 sections, 22 theorems, 178 equations, 2 figures, 1 table)

This paper contains 27 sections, 22 theorems, 178 equations, 2 figures, 1 table.

Introduction
Organization and contributions of this article
Background
Notation
Convex analysis
Classification, proper losses, and Savage representation
Multiclass classification
Proper losses
Savage representation
Strongly proper losses
Regret bounds: Necessity of strict properness
Moduli of convexity
Surrogate regret bounds
Relating surrogate regret to downstream tasks
Task 1: multiclass classification.
...and 12 more sections

Key Result

Lemma 0

Let $f:\mathbb{R}^N \to (-\infty, \infty]$ be a proper convex function such that $\triangle^N \subseteq \mathop{\mathrm{\mathrm{dom}}}\nolimits{f}$. For $\mathbf{q}^0 \in \triangle^N$, the set $\partial f(\mathbf{q}^0)$ is nonempty and $\mathbf{v} \in \partial f(\mathbf{q}^0)$ satisfies $f'(\mathbf{

Figures (2)

Figure 1: Illustration of $\omega(r) = r\sin\left(\frac{1}{r}\right) - \mathrm{Ci}\left(\frac{1}{r}\right) + r$.
Figure 2: Numerical plots of $K^f_p(r) = 8\omega(r)/r^2$ for each $f$ in \ref{['table:examples']}.

Theorems & Definitions (28)

Lemma 0
Lemma 0
Definition 1: Proper losses
Definition 2: Regular losses Gneiting:2007
Proposition 2: Savage representation Savage:1971
Corollary 3: Subgradient of conditional Bayes risk
Proposition 4: Uniqueness up to affine functions
Definition 5: Modulus of convexity
Theorem 6: Monotonicity of modulus
Lemma 6
...and 18 more

Proper losses regret at least 1/2-order

TL;DR

Abstract

Proper losses regret at least 1/2-order

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (28)