Table of Contents
Fetching ...

Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

Amaury Gouverneur, Borja Rodríguez-Gálvez, Tobias J. Oechtering, Mikael Skoglund

TL;DR

The paper addresses regret bounds for bandit problems with metric action spaces by extending information-theoretic analyses to a chaining framework. It introduces Two Steps Thompson Sampling and a chain-link information ratio to leverage reward continuity across nearby actions, yielding a regret bound that depends on the metric entropy of the action space. For d-dimensional linear bandits with smooth rewards, the authors obtain a tight $O(d\\sqrt{T})$ regret rate and a unit-ball bound that matches the minimax rate $\\Omega(d\\sqrt{T})$, extending the applicability of information-theoretic methods to continuous action spaces. The results suggest that the Two Steps variant and chaining techniques can outperform prior $O(d\\sqrt{T\\log T})$ bounds and open avenues for generalized linear and logistic bandits in continuous settings.

Abstract

This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of [Russo and Van Roy, 2015] and, more specifically, on the rate-distortion analysis from [Dong and Van Roy, 2020], where they proved a bound with regret rate of $O(d\sqrt{T \log(T)})$ for the $d$-dimensional linear bandit setting. We focus on bandit problems with a metric action space and, using a chaining argument, we establish new bounds that depend on the metric entropy of the action space for a variant of Thompson-Sampling. Under suitable continuity assumption of the rewards, our bound offers a tight rate of $O(d\sqrt{T})$ for $d$-dimensional linear bandit problems.

Chained Information-Theoretic bounds and Tight Regret Rate for Linear Bandit Problems

TL;DR

The paper addresses regret bounds for bandit problems with metric action spaces by extending information-theoretic analyses to a chaining framework. It introduces Two Steps Thompson Sampling and a chain-link information ratio to leverage reward continuity across nearby actions, yielding a regret bound that depends on the metric entropy of the action space. For d-dimensional linear bandits with smooth rewards, the authors obtain a tight regret rate and a unit-ball bound that matches the minimax rate , extending the applicability of information-theoretic methods to continuous action spaces. The results suggest that the Two Steps variant and chaining techniques can outperform prior bounds and open avenues for generalized linear and logistic bandits in continuous settings.

Abstract

This paper studies the Bayesian regret of a variant of the Thompson-Sampling algorithm for bandit problems. It builds upon the information-theoretic framework of [Russo and Van Roy, 2015] and, more specifically, on the rate-distortion analysis from [Dong and Van Roy, 2020], where they proved a bound with regret rate of for the -dimensional linear bandit setting. We focus on bandit problems with a metric action space and, using a chaining argument, we establish new bounds that depend on the metric entropy of the action space for a variant of Thompson-Sampling. Under suitable continuity assumption of the rewards, our bound offers a tight rate of for -dimensional linear bandit problems.
Paper Structure (18 sections, 8 theorems, 58 equations, 1 algorithm)

This paper contains 18 sections, 8 theorems, 58 equations, 1 algorithm.

Key Result

Proposition 1

Let $\{A^\star_k\}_{k=k_0}^{\infty}$ be defined as in Definition def:kth_quantization. For each time step $t \in \{1,\ldots,T\}$, there exists a sequence of random functions $\{f_t^k\}_{k=k_0}^{\infty}$ that for each $k > k_0$, satisfies the following: where in prop:subprop_no_more_information the sampled actions $\hat{A}_t$ and $\hat{A}_t'$ are identically distributed.

Theorems & Definitions (19)

  • Definition 1: Optimal cumulative reward
  • Definition 2: Bayesian expected regret
  • Definition 3: $\varepsilon$-net and covering number
  • Definition 4: $k^{\textnormal{th}}$-quantization
  • Proposition 1
  • Proof 1
  • Definition 5: Subgaussian process
  • Definition 6: Separable process
  • Definition 7: Smooth rewards
  • Definition 8: Chain-link information ratio
  • ...and 9 more