Online Prediction of Stochastic Sequences with High Probability Regret Bounds

Matthias Frey; Jonathan H. Manton; Jingge Zhu

Online Prediction of Stochastic Sequences with High Probability Regret Bounds

Matthias Frey, Jonathan H. Manton, Jingge Zhu

TL;DR

This work establishes high-probability regret bounds for online universal prediction of stochastic sequences with a known horizon $T$, showing a bound of order $O\big(T^{-1/2}\delta^{-1/2}\big)$ that mirrors the classic $O\big(T^{-1/2}\big)$ in-expectation rate. It formalizes a mismatched-prediction framework, derives both expected and high-probability regret bounds via Azuma-Hoeffding arguments, and proves an impossibility result indicating the $\delta$-dependence cannot be substantially improved without extra assumptions. The results are connected to universal prediction by employing a mixture $Q$ over a parametrized family and bounding the divergence $D(P||Q)$ (or the variational distance) to transfer the bounds to the universal setting. Numerical experiments on finite-state Markov models illustrate practical viability and the behavior of high-probability quantiles. Overall, the paper provides reliability guarantees for online sequence prediction beyond expectation, with implications for non-i.i.d. processes and non-finite alphabets in both theory and practice.

Abstract

We revisit the classical problem of universal prediction of stochastic sequences with a finite time horizon $T$ known to the learner. The question we investigate is whether it is possible to derive vanishing regret bounds that hold with high probability, complementing existing bounds from the literature that hold in expectation. We propose such high-probability bounds which have a very similar form as the prior expectation bounds. For the case of universal prediction of a stochastic process over a countable alphabet, our bound states a convergence rate of $\mathcal{O}(T^{-1/2} δ^{-1/2})$ with probability as least $1-δ$ compared to prior known in-expectation bounds of the order $\mathcal{O}(T^{-1/2})$. We also propose an impossibility result which proves that it is not possible to improve the exponent of $δ$ in a bound of the same form without making additional assumptions.

Online Prediction of Stochastic Sequences with High Probability Regret Bounds

TL;DR

This work establishes high-probability regret bounds for online universal prediction of stochastic sequences with a known horizon

, showing a bound of order

that mirrors the classic

in-expectation rate. It formalizes a mismatched-prediction framework, derives both expected and high-probability regret bounds via Azuma-Hoeffding arguments, and proves an impossibility result indicating the

-dependence cannot be substantially improved without extra assumptions. The results are connected to universal prediction by employing a mixture

over a parametrized family and bounding the divergence

(or the variational distance) to transfer the bounds to the universal setting. Numerical experiments on finite-state Markov models illustrate practical viability and the behavior of high-probability quantiles. Overall, the paper provides reliability guarantees for online sequence prediction beyond expectation, with implications for non-i.i.d. processes and non-finite alphabets in both theory and practice.

Abstract

We revisit the classical problem of universal prediction of stochastic sequences with a finite time horizon

known to the learner. The question we investigate is whether it is possible to derive vanishing regret bounds that hold with high probability, complementing existing bounds from the literature that hold in expectation. We propose such high-probability bounds which have a very similar form as the prior expectation bounds. For the case of universal prediction of a stochastic process over a countable alphabet, our bound states a convergence rate of

with probability as least

compared to prior known in-expectation bounds of the order

. We also propose an impossibility result which proves that it is not possible to improve the exponent of

in a bound of the same form without making additional assumptions.

Paper Structure (23 sections, 10 theorems, 71 equations, 1 figure, 1 table)

This paper contains 23 sections, 10 theorems, 71 equations, 1 figure, 1 table.

Introduction
Related works
The universal and the mismatched prediction problems
Notations and technical statement of the mismatched prediction problem
Results for the mismatched prediction problem
Applying mismatched prediction results to universal prediction
Numerical Experiments
Limitations and future research directions
Detailed Arguments for Remark \ref{['remark:predictor-class']}
Item \ref{['item:predictor-class-finite']}:
Item \ref{['item:predictor-class-log']}:
Existence of measurable $(b^*_t)_{t\in[T]}$ and $(b_t)_{t\in[T]}$
Proof of Lemma \ref{['lemma:tvdist-kldiv']}
Proof of Lemma \ref{['lemma:highprob-regret-tvdist-instantaneous']}
Proof of Theorem \ref{['theorem:highprob-regret']}
...and 8 more sections

Key Result

Theorem 1

If $\mathcal{Z}$ is countable, there is a strategy which is universal for all computable $P$ in the sense that we have $\mathbb{E} \Delta \leq \mathcal{O}(1/\sqrt{T})$.

Figures (1)

Figure 1: Average regret per round for a Markov chain with memory order $m=3$ and $S=2$ states. Shown are the mean and selected quantiles for $4,000$ runs.

Theorems & Definitions (14)

Theorem 1: merhav1998universalhutter2003optimality
Theorem 2
Remark 1
Theorem 3: merhav1998universal, eq. (23)
Lemma 1
Corollary 1: merhav1998universal, eq. (23)
Lemma 2
Theorem 4
Theorem 5
Lemma 3
...and 4 more

Online Prediction of Stochastic Sequences with High Probability Regret Bounds

TL;DR

Abstract

Online Prediction of Stochastic Sequences with High Probability Regret Bounds

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (14)