Table of Contents
Fetching ...

Proper losses regret at least 1/2-order

Han Bao, Asuka Takatsu

TL;DR

The paper investigates surrogate regrets arising from multiclass proper losses and establishes that strict propriety is necessary and sufficient for non-vacuous p-norm regret bounds. It extends surrogate-regret analysis beyond binary classification by introducing moduli of convexity on the probability simplex and proving the optimal $p$-norm convergence rate of $O( ho^{1/2})$ for a broad class of strictly proper losses, with strongly proper losses attaining this rate. The authors connect surrogate regret to downstream tasks (multiclass classification, learning with noisy labels, and bipartite ranking), providing concrete $p$-norm bounds that translate into performance guarantees for plug-in forecasters. They also develop a Simonenko-order-based lower bound framework, showing that no proper loss can asymptotically beat the $O( ho^{1/2})$ rate under mild conditions, and illustrate the theory with detailed examples, including Shannon entropy. Overall, the work clarifies fundamental limits and optimality of loss choices for probability forecasting in multiclass tasks and informs loss selection for practical learning pipelines.

Abstract

A fundamental challenge in machine learning is the choice of a loss as it characterizes our learning task, is minimized in the training phase, and serves as an evaluation criterion for estimators. Proper losses are commonly chosen, ensuring minimizers of the full risk match the true probability vector. Estimators induced from a proper loss are widely used to construct forecasters for downstream tasks such as classification and ranking. In this procedure, how does the forecaster based on the obtained estimator perform well under a given downstream task? This question is substantially relevant to the behavior of the $p$-norm between the estimated and true probability vectors when the estimator is updated. In the proper loss framework, the suboptimality of the estimated probability vector from the true probability vector is measured by a surrogate regret. First, we analyze a surrogate regret and show that the strict properness of a loss is necessary and sufficient to establish a non-vacuous surrogate regret bound. Second, we solve an important open question that the order of convergence in p-norm cannot be faster than the $1/2$-order of surrogate regrets for a broad class of strictly proper losses. This implies that strongly proper losses entail the optimal convergence rate.

Proper losses regret at least 1/2-order

TL;DR

The paper investigates surrogate regrets arising from multiclass proper losses and establishes that strict propriety is necessary and sufficient for non-vacuous p-norm regret bounds. It extends surrogate-regret analysis beyond binary classification by introducing moduli of convexity on the probability simplex and proving the optimal -norm convergence rate of for a broad class of strictly proper losses, with strongly proper losses attaining this rate. The authors connect surrogate regret to downstream tasks (multiclass classification, learning with noisy labels, and bipartite ranking), providing concrete -norm bounds that translate into performance guarantees for plug-in forecasters. They also develop a Simonenko-order-based lower bound framework, showing that no proper loss can asymptotically beat the rate under mild conditions, and illustrate the theory with detailed examples, including Shannon entropy. Overall, the work clarifies fundamental limits and optimality of loss choices for probability forecasting in multiclass tasks and informs loss selection for practical learning pipelines.

Abstract

A fundamental challenge in machine learning is the choice of a loss as it characterizes our learning task, is minimized in the training phase, and serves as an evaluation criterion for estimators. Proper losses are commonly chosen, ensuring minimizers of the full risk match the true probability vector. Estimators induced from a proper loss are widely used to construct forecasters for downstream tasks such as classification and ranking. In this procedure, how does the forecaster based on the obtained estimator perform well under a given downstream task? This question is substantially relevant to the behavior of the -norm between the estimated and true probability vectors when the estimator is updated. In the proper loss framework, the suboptimality of the estimated probability vector from the true probability vector is measured by a surrogate regret. First, we analyze a surrogate regret and show that the strict properness of a loss is necessary and sufficient to establish a non-vacuous surrogate regret bound. Second, we solve an important open question that the order of convergence in p-norm cannot be faster than the -order of surrogate regrets for a broad class of strictly proper losses. This implies that strongly proper losses entail the optimal convergence rate.
Paper Structure (27 sections, 22 theorems, 178 equations, 2 figures, 1 table)

This paper contains 27 sections, 22 theorems, 178 equations, 2 figures, 1 table.

Key Result

Lemma 0

Let $f:\mathbb{R}^N \to (-\infty, \infty]$ be a proper convex function such that $\triangle^N \subseteq \mathop{\mathrm{\mathrm{dom}}}\nolimits{f}$. For $\mathbf{q}^0 \in \triangle^N$, the set $\partial f(\mathbf{q}^0)$ is nonempty and $\mathbf{v} \in \partial f(\mathbf{q}^0)$ satisfies $f'(\mathbf{

Figures (2)

  • Figure 1: Illustration of $\omega(r) = r\sin\left(\frac{1}{r}\right) - \mathrm{Ci}\left(\frac{1}{r}\right) + r$.
  • Figure 2: Numerical plots of $K^f_p(r) = 8\omega(r)/r^2$ for each $f$ in \ref{['table:examples']}.

Theorems & Definitions (28)

  • Lemma 0
  • Lemma 0
  • Definition 1: Proper losses
  • Definition 2: Regular losses Gneiting:2007
  • Proposition 2: Savage representation Savage:1971
  • Corollary 3: Subgradient of conditional Bayes risk
  • Proposition 4: Uniqueness up to affine functions
  • Definition 5: Modulus of convexity
  • Theorem 6: Monotonicity of modulus
  • Lemma 6
  • ...and 18 more