Table of Contents
Fetching ...

The SMART approach to instance-optimal online learning

Siddhartha Banerjee, Alankrita Bhatt, Christina Lee Yu

TL;DR

The paper introduces SMART, a simple online-learning algorithm that adapts to data and achieves instance-optimal regret by switching once from Follow-The-Leader (FTL) to a known worst-case algorithm. By reducing instance-optimal online learning to an optimal stopping (ski-rental) problem, SMART guarantees Reg(SMART,ℓ^n) ≤ (e/(e−1)) min{Reg(FTL,ℓ^n), g(n)} + 1, and proves a fundamental lower bound of 1.4335 on the best possible competitive ratio. It further extends SMART to small-loss settings, enabling instance-optimality between Reg(FTL,ℓ^n) and small-loss bounds g(L^*), with epoch-based guessing for unknown L^*. The results provide a principled framework for designing data-adaptive, near-optimal best-of-both-worlds guarantees, and open directions for bandit and multi-reference settings.

Abstract

We devise an online learning algorithm -- titled Switching via Monotone Adapted Regret Traces (SMART) -- that adapts to the data and achieves regret that is instance optimal, i.e., simultaneously competitive on every input sequence compared to the performance of the follow-the-leader (FTL) policy and the worst case guarantee of any other input policy. We show that the regret of the SMART policy on any input sequence is within a multiplicative factor $e/(e-1) \approx 1.58$ of the smaller of: 1) the regret obtained by FTL on the sequence, and 2) the upper bound on regret guaranteed by the given worst-case policy. This implies a strictly stronger guarantee than typical `best-of-both-worlds' bounds as the guarantee holds for every input sequence regardless of how it is generated. SMART is simple to implement as it begins by playing FTL and switches at most once during the time horizon to the worst-case algorithm. Our approach and results follow from an operational reduction of instance optimal online learning to competitive analysis for the ski-rental problem. We complement our competitive ratio upper bounds with a fundamental lower bound showing that over all input sequences, no algorithm can get better than a $1.43$-fraction of the minimum regret achieved by FTL and the minimax-optimal policy. We also present a modification of SMART that combines FTL with a ``small-loss" algorithm to achieve instance optimality between the regret of FTL and the small loss regret bound.

The SMART approach to instance-optimal online learning

TL;DR

The paper introduces SMART, a simple online-learning algorithm that adapts to data and achieves instance-optimal regret by switching once from Follow-The-Leader (FTL) to a known worst-case algorithm. By reducing instance-optimal online learning to an optimal stopping (ski-rental) problem, SMART guarantees Reg(SMART,ℓ^n) ≤ (e/(e−1)) min{Reg(FTL,ℓ^n), g(n)} + 1, and proves a fundamental lower bound of 1.4335 on the best possible competitive ratio. It further extends SMART to small-loss settings, enabling instance-optimality between Reg(FTL,ℓ^n) and small-loss bounds g(L^*), with epoch-based guessing for unknown L^*. The results provide a principled framework for designing data-adaptive, near-optimal best-of-both-worlds guarantees, and open directions for bandit and multi-reference settings.

Abstract

We devise an online learning algorithm -- titled Switching via Monotone Adapted Regret Traces (SMART) -- that adapts to the data and achieves regret that is instance optimal, i.e., simultaneously competitive on every input sequence compared to the performance of the follow-the-leader (FTL) policy and the worst case guarantee of any other input policy. We show that the regret of the SMART policy on any input sequence is within a multiplicative factor of the smaller of: 1) the regret obtained by FTL on the sequence, and 2) the upper bound on regret guaranteed by the given worst-case policy. This implies a strictly stronger guarantee than typical `best-of-both-worlds' bounds as the guarantee holds for every input sequence regardless of how it is generated. SMART is simple to implement as it begins by playing FTL and switches at most once during the time horizon to the worst-case algorithm. Our approach and results follow from an operational reduction of instance optimal online learning to competitive analysis for the ski-rental problem. We complement our competitive ratio upper bounds with a fundamental lower bound showing that over all input sequences, no algorithm can get better than a -fraction of the minimum regret achieved by FTL and the minimax-optimal policy. We also present a modification of SMART that combines FTL with a ``small-loss" algorithm to achieve instance optimality between the regret of FTL and the small loss regret bound.
Paper Structure (13 sections, 17 theorems, 69 equations, 2 figures, 2 algorithms)

This paper contains 13 sections, 17 theorems, 69 equations, 2 figures, 2 algorithms.

Key Result

Theorem 1

(See Theorem thm:SkiRentalRegret) Let $\mathsf{ALG}_{\mathsf{WC}}$ have worst-case regret $\sup_{\ell^n} \mathrm{Reg}(\mathsf{ALG}_{\mathsf{WC}}, \ell^n) \le g(n)$ where $g(n)$ is some monotonic function of $n$. An instantiation of $\mathsf{SMART}$ achieves

Figures (2)

  • Figure 1: Comparing regret of $\mathsf{FTL}$, $\mathsf{Cover}$ and $\mathsf{SMART}$ on a collection of input sequences (for fixed $n$). $\bullet$ In Fig. $(a)$, we consider i.i.d. Bernoulli$(p)$ inputs for varying $p$. The regret of $\mathsf{FTL}$ is much lower than $\mathsf{Cover}$ for $p<1/2$; the regret of $\mathsf{SMART}$ tracks $\mathsf{FTL}$ closely (better than $2\mathrm{Reg}(\mathsf{FTL})$, indicated by dotted line). $\bullet$ In Fig. $(b)$ and $(c)$, we consider 'worst-case' binary sequences (as per feder1992universal) parameterized by the number of 'lead-changes': the sequence with parameter $c$ comprises of $c$ pairs '$0,1$' or '$1,0$', followed by $n-2c$ '$1$'s. In Fig. $(b)$, we consider $\mathsf{SMART}$ with a deterministic switching threshold (\ref{['thm:2ApproxRegret']}) and compare $\mathrm{Reg}(\mathsf{SMART})$ with $2\mathrm{Reg}(\mathsf{FTL})$ and $2\mathrm{Reg}(\mathsf{Cover})$ (dotted lines); in Fig. $(c)$, we use a randomized threshold (\ref{['thm:SkiRentalRegret']}), and show the average regret over the randomized threshold, as well as sample paths (plotted in green), and compare with $\frac{e}{e-1}$ times $\mathrm{Reg}(\mathsf{FTL})$ and $\mathrm{Reg}(\mathsf{Cover})$ (dotted lines).
  • Figure 2: Figure \ref{['fig:proof_intuition']}(a) on the left shows the worst case instance in binary prediction for an algorithm which starts with $\mathsf{FTL}$ and switches at most once during the time horizon to $\mathsf{Cover}$. Figure\ref{['fig:proof_intuition']}(b) on the right depicts in a prediction with experts setting how $\mathsf{SMART}$ resets the losses after the switch from $\mathsf{FTL}$ to $\mathsf{ALG}_{\mathsf{WC}}$.

Theorems & Definitions (28)

  • Example 1: Binary Prediction
  • Definition 1: Instance Optimality
  • Definition 2
  • Theorem
  • Corollary 1: Following Theorem \ref{['thm:Regret_small_loss']}
  • Theorem
  • Lemma 1: Regret of $\mathsf{FTL}$
  • proof
  • Theorem 1: Regret of $\mathsf{SMART}$ with deterministic threshold
  • Theorem 2: Regret of $\mathsf{SMART}$ with Randomized Thresholds
  • ...and 18 more