Table of Contents
Fetching ...

Operationalizing Stein's Method for Online Linear Optimization: CLT-Based Optimal Tradeoffs

Zhiyu Zhang, Aaditya Ramdas

TL;DR

Stein's method, a classical framework underlying the proofs of probabilistic limit theorems, can be operationalized as computationally efficient OLO algorithms and can realize a continuum of optimal two-point tradeoffs between the total loss and the maximum regret over comparators.

Abstract

Adversarial online linear optimization (OLO) is essentially about making performance tradeoffs with respect to the unknown difficulty of the adversary. In the setting of one-dimensional fixed-time OLO on a bounded domain, it has been observed since Cover (1966) that achievable tradeoffs are governed by probabilistic inequalities, and these descriptive results can be converted into algorithms via dynamic programming, which, however, is not computationally efficient. We address this limitation by showing that Stein's method, a classical framework underlying the proofs of probabilistic limit theorems, can be operationalized as computationally efficient OLO algorithms. The associated regret and total loss upper bounds are "additively sharp", meaning that they surpass the conventional big-O optimality and match normal-approximation-based lower bounds by additive lower order terms. Our construction is inspired by the remarkably clean proof of a Wasserstein martingale central limit theorem (CLT) due to Röllin (2018). Several concrete benefits can be obtained from this general technique. First, with the same computational complexity, the proposed algorithm improves upon the total loss upper bounds of online gradient descent (OGD) and multiplicative weight update (MWU). Second, our algorithm can realize a continuum of optimal two-point tradeoffs between the total loss and the maximum regret over comparators, improving upon prior works in parameter-free online learning. Third, by allowing the adversary to randomize on an unbounded support, we achieve sharp in-expectation performance guarantees for OLO with noisy feedback.

Operationalizing Stein's Method for Online Linear Optimization: CLT-Based Optimal Tradeoffs

TL;DR

Stein's method, a classical framework underlying the proofs of probabilistic limit theorems, can be operationalized as computationally efficient OLO algorithms and can realize a continuum of optimal two-point tradeoffs between the total loss and the maximum regret over comparators.

Abstract

Adversarial online linear optimization (OLO) is essentially about making performance tradeoffs with respect to the unknown difficulty of the adversary. In the setting of one-dimensional fixed-time OLO on a bounded domain, it has been observed since Cover (1966) that achievable tradeoffs are governed by probabilistic inequalities, and these descriptive results can be converted into algorithms via dynamic programming, which, however, is not computationally efficient. We address this limitation by showing that Stein's method, a classical framework underlying the proofs of probabilistic limit theorems, can be operationalized as computationally efficient OLO algorithms. The associated regret and total loss upper bounds are "additively sharp", meaning that they surpass the conventional big-O optimality and match normal-approximation-based lower bounds by additive lower order terms. Our construction is inspired by the remarkably clean proof of a Wasserstein martingale central limit theorem (CLT) due to Röllin (2018). Several concrete benefits can be obtained from this general technique. First, with the same computational complexity, the proposed algorithm improves upon the total loss upper bounds of online gradient descent (OGD) and multiplicative weight update (MWU). Second, our algorithm can realize a continuum of optimal two-point tradeoffs between the total loss and the maximum regret over comparators, improving upon prior works in parameter-free online learning. Third, by allowing the adversary to randomize on an unbounded support, we achieve sharp in-expectation performance guarantees for OLO with noisy feedback.
Paper Structure (74 sections, 40 theorems, 158 equations, 3 figures, 1 algorithm)

This paper contains 74 sections, 40 theorems, 158 equations, 3 figures, 1 algorithm.

Key Result

Theorem 1

Assume that $g_t\in\{-1,1\}$ for all $t\in[1:T]$, and define $\mathrm{RS}(n)$ as the distribution of the sum of $n$ independent Rademacher random variables. Then, for all convex and $1$-Lipschitz function $\psi^*_T:\mathbb{R}\rightarrow(-\infty,\infty]$, there exists an algorithm achieving the total In particular, the corresponding algorithm outputs the expectation of a discrete derivative,

Figures (3)

  • Figure 1: Comparison of the prefactors of $\sqrt{T}$ in the regret bounds of our algorithm (denoted by $\gamma_{\mathrm{Huber}}(u,\alpha)$), OGD (denoted by $\gamma_{\mathrm{OGD}}(u,\alpha)$), and Cover's algorithm (the optimal $u$-independent prefactor $\sqrt{\frac{2}{\pi}}$; computationally efficient versions are provided by kobzar2020a_newgreenstreet2022efficient). $u$ is the comparator and $\alpha$ is the scaling factor of the learning rate; see Section \ref{['subsection:huber']} for definitions. Left: with $\alpha=1$, $\gamma_{\mathrm{Huber}}(u,\alpha)$ and $\gamma_{\mathrm{OGD}}(u,\alpha)$ are compared as functions of $u$, lower is better. Our regret bound dominates that of OGD for all $u\in[-1,1]$, while the optimal $u$-independent bound does not. Middle: with $\alpha=5$, the improvement over OGD becomes more significant. Right: the margin of improvement $\mathrm{Gap}_{\mathrm{OGD}}(\alpha)\mathrel{\mathop:}=\gamma_{\mathrm{OGD}}(u,\alpha)-\gamma_{\mathrm{Huber}}(u,\alpha)$ as a function of $\alpha$.
  • Figure 2: Comparison of the prefactors of $\sqrt{T}$ in the regret bounds of our algorithm (represented by $\gamma_{\mathrm{LSE}}(u,\alpha)$), OGD (represented by $\gamma_{\mathrm{MWU}}(u,\alpha)$), and Cover's algorithm (the optimal $u$-independent prefactor $\sqrt{\frac{2}{\pi}}$). Analogous to Figure \ref{['figure:ogd']}, but based on the log-sum-exp regime (Corollary \ref{['corollary:regret_logcosh']}). Left: with $\alpha=\sqrt{2\ln 2}$ which minimizes $\sup_{u\in[-1,1]}\gamma_{\mathrm{MWU}}(1,\alpha)$, $\gamma_{\mathrm{LSE}}(u,\alpha)$ and $\gamma_{\mathrm{MWU}}(u,\alpha)$ are compared as functions of $u$, lower is better. Middle: with $\alpha=5$, the gap between $\gamma_{\mathrm{MWU}}(u,\alpha)$ and $\gamma_{\mathrm{LSE}}(u,\alpha)$ widens. Right: the margin of improvement $\mathrm{Gap}_{\mathrm{MWU}}(\alpha)\mathrel{\mathop:}=\gamma_{\mathrm{MWU}}(u,\alpha)-\gamma_{\mathrm{LSE}}(u,\alpha)$ as a function of $\alpha$.
  • Figure 3: Comparison of the prefactors of $\sqrt{T}$ in the $\mathrm{Regret}_T^\mathrm{unif}$ upper bound of our algorithm (LHS of Eq.\ref{['eq:two_prefactors']}, represented by blue) and the baseline cutkosky2018blackzhang2022pde (RHS of Eq.\ref{['eq:two_prefactors']}, represented by orange); lower is better. Both are functions of $\varepsilon\in(0,\sqrt{\frac{2}{\pi}}]$ which represents the budget on $\mathrm{Loss}_T$. Our result dominates that of the baseline.

Theorems & Definitions (61)

  • Example 1: Uniform regret
  • Theorem 1: Cover's characterization, adapted
  • Theorem 2: Theorem \ref{['theorem:main']} and \ref{['theorem:loss_lower']}, informal
  • Definition 2.1: Solution of Stein equation
  • Lemma 2.1: Lemma \ref{['lemma:equivalent']}, simplified
  • Theorem 3: Main result; upper bound on $\mathrm{Loss}_T$
  • Theorem 4: Lower bound on $\loss_T$
  • proof : Proof of Theorem \ref{['theorem:main']}
  • Corollary 4: Regret: absolute value
  • Corollary 4: Regret: Huber
  • ...and 51 more