Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss

Shinsaku Sakaue; Han Bao; Taira Tsuchiya; Taihei Oki

Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss

Shinsaku Sakaue, Han Bao, Taira Tsuchiya, Taihei Oki

TL;DR

This paper extends the exploit-the-surrogate-gap framework to online structured prediction with Fenchel--Young losses, a large family of surrogate losses that includes the logistic loss for multiclass classification as a special case, obtaining finite surrogate regret bounds in various structured prediction problems.

Abstract

This paper studies online structured prediction with full-information feedback. For online multiclass classification, Van der Hoeven (2020) established \emph{finite} surrogate regret bounds, which are independent of the time horizon, by introducing an elegant \emph{exploit-the-surrogate-gap} framework. However, this framework has been limited to multiclass classification primarily because it relies on a classification-specific procedure for converting estimated scores to outputs. We extend the exploit-the-surrogate-gap framework to online structured prediction with \emph{Fenchel--Young losses}, a large family of surrogate losses that includes the logistic loss for multiclass classification as a special case, obtaining finite surrogate regret bounds in various structured prediction problems. To this end, we propose and analyze \emph{randomized decoding}, which converts estimated scores to general structured outputs. Moreover, by applying our decoding to online multiclass classification with the logistic loss, we obtain a surrogate regret bound of $O(\| \mathbf{U} \|_\mathrm{F}^2)$, where $\mathbf{U}$ is the best offline linear estimator and $\| \cdot \|_\mathrm{F}$ denotes the Frobenius norm. This bound is tight up to logarithmic factors and improves the previous bound of $O(d\| \mathbf{U} \|_\mathrm{F}^2)$ due to Van der Hoeven (2020) by a factor of $d$, the number of classes.

Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss

TL;DR

Abstract

, where

is the best offline linear estimator and

denotes the Frobenius norm. This bound is tight up to logarithmic factors and improves the previous bound of

due to Van der Hoeven (2020) by a factor of

, the number of classes.

Paper Structure (38 sections, 14 theorems, 56 equations, 2 algorithms)

This paper contains 38 sections, 14 theorems, 56 equations, 2 algorithms.

Introduction
Additional Related Work
Structured prediction.
Online multiclass classification.
Preliminaries
Problem Setting
Fenchel--Young Loss
Examples
Multiclass classification.
Multilabel classification.
Ranking.
Randomized Decoding
Necessity of mixing $\bm{y}^*$ and $\widetilde{\bm{y}}$.
Implementation of Randomized Decoding
Surrogate Regret Bounds for Online Structured Prediction
...and 23 more sections

Key Result

Proposition 2.3

[proposition]prop:fyloss_properties Let $S_\Omega$ be a Fenchel--Young loss generated by $\Omega = \Psi + I_{\mathop{\mathrm{conv}}\nolimits(\mathcal{Y})}$, where $\Psi:\mathbb{R}^d\to\mathbb{R}\cup\{+\infty\}$ satisfies the above properties. For $\bm{\theta} \in \mathbb{R}^d$, define the regularize where the maximizer is unique. Then, for any $\bm{y} \in \mathcal{Y}$, $S_\Omega(\bm{\theta};\bm{y}

Theorems & Definitions (31)

Definition 2.2
Proposition 2.3
Lemma 3.1
proof
Proposition 3.2
proof
Proposition 4.1
proof
Theorem 4.2
proof
...and 21 more

Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss

TL;DR

Abstract

Online Structured Prediction with Fenchel--Young Losses and Improved Surrogate Regret for Online Multiclass Classification with Logistic Loss

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (31)