Table of Contents
Fetching ...

Non-Stationary Online Structured Prediction with Surrogate Losses

Shinsaku Sakaue, Han Bao, Yuzhou Cao

TL;DR

The paper tackles non-stationary online structured prediction under surrogate losses, where classical finite surrogate regret bounds fail to control the target loss. It derives a tight, data‑dependent bound on cumulative target loss: $\sum_{t=1}^T \ell(\hat{y}_t,y_t) = F_T + C(1 + P_T)$, with $F_T$ the surrogate loss of a comparator sequence and $P_T$ its path length, by fusing dynamic regret analysis of online gradient descent with a surrogate-gap decoding mechanism. A Polyak‑style learning rate is proposed to guarantee target-loss bounds in practice, and the framework is extended to the broader class of convolutional Fenchel–Young losses, enabling nontrivial targets like ranking and NDCG. A matching lower bound shows the $F_T$ and $P_T$ dependencies are tight in the worst case, and the results collectively provide non-stationary, target-loss guarantees in full-information online structured prediction. The work also highlights practical implications for adaptive optimization and decoding in non-stationary environments, with empirical support for the proposed learning-rate strategy.

Abstract

Online structured prediction, including online classification as a special case, is the task of sequentially predicting labels from input features. Therein the surrogate regret -- the cumulative excess of the target loss (e.g., 0-1 loss) over the surrogate loss (e.g., logistic loss) of the fixed best estimator -- has gained attention, particularly because it often admits a finite bound independent of the time horizon $T$. However, such guarantees break down in non-stationary environments, where every fixed estimator may incur the surrogate loss growing linearly with $T$. We address this by proving a bound of the form $F_T + C(1 + P_T)$ on the cumulative target loss, where $F_T$ is the cumulative surrogate loss of any comparator sequence, $P_T$ is its path length, and $C > 0$ is some constant. This bound depends on $T$ only through $F_T$ and $P_T$, often yielding much stronger guarantees in non-stationary environments. Our core idea is to synthesize the dynamic regret bound of the online gradient descent (OGD) with the technique of exploiting the surrogate gap. Our analysis also sheds light on a new Polyak-style learning rate for OGD, which systematically offers target-loss guarantees and exhibits promising empirical performance. We further extend our approach to a broader class of problems via the convolutional Fenchel--Young loss. Finally, we prove a lower bound showing that the dependence on $F_T$ and $P_T$ is tight.

Non-Stationary Online Structured Prediction with Surrogate Losses

TL;DR

The paper tackles non-stationary online structured prediction under surrogate losses, where classical finite surrogate regret bounds fail to control the target loss. It derives a tight, data‑dependent bound on cumulative target loss: , with the surrogate loss of a comparator sequence and its path length, by fusing dynamic regret analysis of online gradient descent with a surrogate-gap decoding mechanism. A Polyak‑style learning rate is proposed to guarantee target-loss bounds in practice, and the framework is extended to the broader class of convolutional Fenchel–Young losses, enabling nontrivial targets like ranking and NDCG. A matching lower bound shows the and dependencies are tight in the worst case, and the results collectively provide non-stationary, target-loss guarantees in full-information online structured prediction. The work also highlights practical implications for adaptive optimization and decoding in non-stationary environments, with empirical support for the proposed learning-rate strategy.

Abstract

Online structured prediction, including online classification as a special case, is the task of sequentially predicting labels from input features. Therein the surrogate regret -- the cumulative excess of the target loss (e.g., 0-1 loss) over the surrogate loss (e.g., logistic loss) of the fixed best estimator -- has gained attention, particularly because it often admits a finite bound independent of the time horizon . However, such guarantees break down in non-stationary environments, where every fixed estimator may incur the surrogate loss growing linearly with . We address this by proving a bound of the form on the cumulative target loss, where is the cumulative surrogate loss of any comparator sequence, is its path length, and is some constant. This bound depends on only through and , often yielding much stronger guarantees in non-stationary environments. Our core idea is to synthesize the dynamic regret bound of the online gradient descent (OGD) with the technique of exploiting the surrogate gap. Our analysis also sheds light on a new Polyak-style learning rate for OGD, which systematically offers target-loss guarantees and exhibits promising empirical performance. We further extend our approach to a broader class of problems via the convolutional Fenchel--Young loss. Finally, we prove a lower bound showing that the dependence on and is tight.

Paper Structure

This paper contains 41 sections, 6 theorems, 55 equations, 1 figure.

Key Result

Proposition 2.4

Compute $\bm{W}_1, \dots, \bm{W}_T \in \mathcal{W}$ by applying OGD with non-increasing learning rate $\eta_t > 0$ to convex loss functions $L_t\colon\mathcal{W}\to\mathbb{R}$ for $t=1,\dots,T$; i.e., set $\bm{W}_1$ to an arbitrary point in $\mathcal{W}$ and let for $t=1,\dots,T$, where $\bm{G}_t \in \partial L_t(\bm{W}_t)$. If the diameter of $\mathcal{W}$ is at most $D$ as in assump:basic, then

Figures (1)

  • Figure 1: Experimental results for different learning rates under varying numbers of label flips.

Theorems & Definitions (15)

  • Definition 2.1
  • Definition 2.3
  • Proposition 2.4
  • Theorem 3.4
  • proof
  • Definition 4.1: Convolutional Fenchel--Young Loss
  • Lemma 4.3: Target--surrogate relation
  • Lemma 4.4
  • Theorem 4.5
  • proof
  • ...and 5 more