A second order regret bound for NormalHedge
Yoav Freund, Nicholas J. A. Harvey, Victor S. Portella, Yabing Qi, Yu-Xiang Wang
TL;DR
This work resolves a long-standing question on adaptive, second-order regret in the prediction-with-expert-advice setting by showing a NormalHedge variant achieves a quantile- and variance-based bound of the form $\mathrm{Regret}_{\varepsilon}(T) = O\left(\sqrt{(t_0+2V_T)\big(\log(t_0+2V_T)+2\log(1/\varepsilon)\big)}\right)$, where $V_T$ is the cumulative second moment of instantaneous regrets under a problem-dependent distribution. The authors develop a CP (constant-potential) Hedge framework using good potentials that obey the backwards heat equation, enabling a discretization-error control via local self-concordance and a continuous-time SDE-inspired interpretation. Their main contribution is proving adaptive, second-order quantile regret for NormalHedge.BH, including a carefully chosen initialization $t_0$ and a lower-bound that clarifies the limits of adaptivity. The results unify a continuous-time stochastic-calculus perspective with a rigorous discrete-time analysis, showing that algorithm-dependent variance measures can yield near-optimal, parameter-free regret bounds and advancing understanding of variance-aware online learning. The work has implications for adaptive algorithms in online decision tasks, where regret against the top fraction of experts can be bounded without tuning to unknown sequence properties, by tying performance to the cumulative second moment $V_T$.
Abstract
We consider the problem of prediction with expert advice for ``easy'' sequences. We show that a variant of NormalHedge enjoys a second-order $ε$-quantile regret bound of $O\big(\sqrt{V_T \log(V_T/ε)}\big) $ when $V_T > \log N$, where $V_T$ is the cumulative second moment of instantaneous per-expert regret averaged with respect to a natural distribution determined by the algorithm. The algorithm is motivated by a continuous time limit using Stochastic Differential Equations. The discrete time analysis uses self-concordance techniques.
