Table of Contents
Fetching ...

Bayesian Design Principles for Frequentist Sequential Learning

Yunbei Xu, Assaf Zeevi

TL;DR

This work proposes Algorithmic Information Ratio (AIR) as a principled, prior-free Bayesian-style framework for sequential learning with partial feedback, unifying frequentist regret analysis with constructive Bayesian-belief design. By maximizing AIR, the authors generate round-by-round algorithmic beliefs that balance exploration and exploitation and lead to regret guarantees matching the best-known priors-based results across stochastic, adversarial, and non-stationary environments. They introduce Model-index AIR (MAIR) for stochastic settings and derive constructive algorithms—Adaptive Posterior Sampling (APS) and Adaptive Minimax Sampling (AMS)—with closed-form implementations in key cases like Bernoulli MAB and Gaussian linear bandits. The framework extends to linear bandits, bandit convex optimization, and reinforcement learning, delivering near-optimal rates (e.g., $O( ilde{d}^{2.5}\sqrt{T})$ for bandit convex optimization) with finite-time, poly-time algorithms. Empirically, a Bernoulli MAB instance demonstrates “best-of-all-worlds” performance, outperforming UCB and EXP3 across regimes, illustrating the practical impact of AIR-based principled design for real-world sequential decision problems.

Abstract

We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization approach to generate "algorithmic beliefs" at each round, and use Bayesian posteriors to make decisions. The optimization objective to create "algorithmic beliefs," which we term "Algorithmic Information Ratio," represents an intrinsic complexity measure that effectively characterizes the frequentist regret of any algorithm. To the best of our knowledge, this is the first systematical approach to make Bayesian-type algorithms prior-free and applicable to adversarial settings, in a generic and optimal manner. Moreover, the algorithms are simple and often efficient to implement. As a major application, we present a novel algorithm for multi-armed bandits that achieves the "best-of-all-worlds" empirical performance in the stochastic, adversarial, and non-stationary environments. And we illustrate how these principles can be used in linear bandits, bandit convex optimization, and reinforcement learning.

Bayesian Design Principles for Frequentist Sequential Learning

TL;DR

This work proposes Algorithmic Information Ratio (AIR) as a principled, prior-free Bayesian-style framework for sequential learning with partial feedback, unifying frequentist regret analysis with constructive Bayesian-belief design. By maximizing AIR, the authors generate round-by-round algorithmic beliefs that balance exploration and exploitation and lead to regret guarantees matching the best-known priors-based results across stochastic, adversarial, and non-stationary environments. They introduce Model-index AIR (MAIR) for stochastic settings and derive constructive algorithms—Adaptive Posterior Sampling (APS) and Adaptive Minimax Sampling (AMS)—with closed-form implementations in key cases like Bernoulli MAB and Gaussian linear bandits. The framework extends to linear bandits, bandit convex optimization, and reinforcement learning, delivering near-optimal rates (e.g., for bandit convex optimization) with finite-time, poly-time algorithms. Empirically, a Bernoulli MAB instance demonstrates “best-of-all-worlds” performance, outperforming UCB and EXP3 across regimes, illustrating the practical impact of AIR-based principled design for real-world sequential decision problems.

Abstract

We develop a general theory to optimize the frequentist regret for sequential learning problems, where efficient bandit and reinforcement learning algorithms can be derived from unified Bayesian principles. We propose a novel optimization approach to generate "algorithmic beliefs" at each round, and use Bayesian posteriors to make decisions. The optimization objective to create "algorithmic beliefs," which we term "Algorithmic Information Ratio," represents an intrinsic complexity measure that effectively characterizes the frequentist regret of any algorithm. To the best of our knowledge, this is the first systematical approach to make Bayesian-type algorithms prior-free and applicable to adversarial settings, in a generic and optimal manner. Moreover, the algorithms are simple and often efficient to implement. As a major application, we present a novel algorithm for multi-armed bandits that achieves the "best-of-all-worlds" empirical performance in the stochastic, adversarial, and non-stationary environments. And we illustrate how these principles can be used in linear bandits, bandit convex optimization, and reinforcement learning.
Paper Structure (93 sections, 34 theorems, 218 equations, 9 figures, 8 algorithms)

This paper contains 93 sections, 34 theorems, 218 equations, 9 figures, 8 algorithms.

Key Result

Lemma 2.3

For any $q\in \text{int}(\Delta(\Pi))$, $p\in\Delta(\Pi)$, belief $\nu\in\Delta(\mathcal{M}\times \Pi)$, and $\eta>0$, we have

Figures (9)

  • Figure 1: Sensitivity analysis in a stochastic bandit problem.
  • Figure 2: Jackson Pollock, Mural (1943) [Oil on canvas]. University of Iowa Museum of Art.
  • Figure 3: Sensitivity analysis in an adversarial bandit problem.
  • Figure 4: Sensitivity analysis in a "change points" environment.
  • Figure 5: Comparing APS to "clairvoyant" restarted algorithms in a "change points" environment.
  • ...and 4 more figures

Theorems & Definitions (43)

  • Example 1: Bernoulli multi-armed bandits (MAB)
  • Example 2: Structured bandits
  • Definition 2.1: Algorithmic Information Ratio
  • Definition 2.2: Information ratio
  • Lemma 2.3: Bounding AIR by IR
  • Definition 2.4: Decision-estimation coefficient
  • Lemma 2.5: Bounding AIR by DEC
  • Definition 2.6: Model-index AIR
  • Lemma 2.7: Relationship of MAIR and AIR
  • Lemma 2.8: Bounding MAIR by DEC
  • ...and 33 more