Table of Contents
Fetching ...

Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss

Shinsaku Sakaue, Han Bao, Taira Tsuchiya, Taihei Oki

TL;DR

This paper extends the exploit-the-surrogate-gap framework to online structured prediction with Fenchel--Young losses, a large family of surrogate losses that includes the logistic loss for multiclass classification as a special case, obtaining finite surrogate regret bounds in various structured prediction problems.

Abstract

This paper studies online structured prediction with full-information feedback. For online multiclass classification, Van der Hoeven (2020) established \emph{finite} surrogate regret bounds, which are independent of the time horizon, by introducing an elegant \emph{exploit-the-surrogate-gap} framework. However, this framework has been limited to multiclass classification primarily because it relies on a classification-specific procedure for converting estimated scores to outputs. We extend the exploit-the-surrogate-gap framework to online structured prediction with \emph{Fenchel--Young losses}, a large family of surrogate losses that includes the logistic loss for multiclass classification as a special case, obtaining finite surrogate regret bounds in various structured prediction problems. To this end, we propose and analyze \emph{randomized decoding}, which converts estimated scores to general structured outputs. Moreover, by applying our decoding to online multiclass classification with the logistic loss, we obtain a surrogate regret bound of $O(\| \mathbf{U} \|_\mathrm{F}^2)$, where $\mathbf{U}$ is the best offline linear estimator and $\| \cdot \|_\mathrm{F}$ denotes the Frobenius norm. This bound is tight up to logarithmic factors and improves the previous bound of $O(d\| \mathbf{U} \|_\mathrm{F}^2)$ due to Van der Hoeven (2020) by a factor of $d$, the number of classes.

Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss

TL;DR

This paper extends the exploit-the-surrogate-gap framework to online structured prediction with Fenchel--Young losses, a large family of surrogate losses that includes the logistic loss for multiclass classification as a special case, obtaining finite surrogate regret bounds in various structured prediction problems.

Abstract

This paper studies online structured prediction with full-information feedback. For online multiclass classification, Van der Hoeven (2020) established \emph{finite} surrogate regret bounds, which are independent of the time horizon, by introducing an elegant \emph{exploit-the-surrogate-gap} framework. However, this framework has been limited to multiclass classification primarily because it relies on a classification-specific procedure for converting estimated scores to outputs. We extend the exploit-the-surrogate-gap framework to online structured prediction with \emph{Fenchel--Young losses}, a large family of surrogate losses that includes the logistic loss for multiclass classification as a special case, obtaining finite surrogate regret bounds in various structured prediction problems. To this end, we propose and analyze \emph{randomized decoding}, which converts estimated scores to general structured outputs. Moreover, by applying our decoding to online multiclass classification with the logistic loss, we obtain a surrogate regret bound of , where is the best offline linear estimator and denotes the Frobenius norm. This bound is tight up to logarithmic factors and improves the previous bound of due to Van der Hoeven (2020) by a factor of , the number of classes.
Paper Structure (38 sections, 14 theorems, 56 equations, 2 algorithms)

This paper contains 38 sections, 14 theorems, 56 equations, 2 algorithms.

Key Result

Proposition 2.3

[proposition]prop:fyloss_properties Let $S_\Omega$ be a Fenchel--Young loss generated by $\Omega = \Psi + I_{\mathop{\mathrm{conv}}\nolimits(\mathcal{Y})}$, where $\Psi:\mathbb{R}^d\to\mathbb{R}\cup\{+\infty\}$ satisfies the above properties. For $\bm{\theta} \in \mathbb{R}^d$, define the regularize where the maximizer is unique. Then, for any $\bm{y} \in \mathcal{Y}$, $S_\Omega(\bm{\theta};\bm{y}

Theorems & Definitions (31)

  • Definition 2.2
  • Proposition 2.3
  • Lemma 3.1
  • proof
  • Proposition 3.2
  • proof
  • Proposition 4.1
  • proof
  • Theorem 4.2
  • proof
  • ...and 21 more