Table of Contents
Fetching ...

Restless Linear Bandits

Azadeh Khaleghi

TL;DR

This work generalizes linear bandits to a restless setting where the payoff-generating parameter sequence $\theta_t$ is stationary and φ-mixing, yielding time-dependent rewards $Y_t=\langle \theta_t, X_t\rangle$. It quantifies the cost of replacing a dynamic restless oracle with a static mean oracle via the bound $\nu_n-\widetilde{\nu}_n \le 2 n \varphi_1 \|\theta_t\|_{\mathcal{L}_{\infty}}$ and introduces LinMix-UCB, an optimistic algorithm that handles long-range dependencies under an exponential mixing rate $\varphi_m \le a e^{-\gamma m}$. LinMix-UCB achieves sublinear regret with respect to an oracle that always plays a multiple of $\mathbb{E}[\theta_t]$, with finite-horizon guarantees of the form $\mathcal{O}(\sqrt{d n \mathrm{polylog}(n)})$ and infinite-horizon guarantees via a doubling trick. The analysis relies on Berbee's coupling to generate near-independent samples and on confidence ellipsoids around $\theta^*$, bridging restless bandits and time-series concentration methods. The work opens avenues to relax the mixing assumption, learn mixing parameters online, and establish corresponding lower bounds.

Abstract

A more general formulation of the linear bandit problem is considered to allow for dependencies over time. Specifically, it is assumed that there exists an unknown $\mathbb{R}^d$-valued stationary $\varphi$-mixing sequence of parameters $(θ_t,~t \in \mathbb{N})$ which gives rise to pay-offs. This instance of the problem can be viewed as a generalization of both the classical linear bandits with iid noise, and the finite-armed restless bandits. In light of the well-known computational hardness of optimal policies for restless bandits, an approximation is proposed whose error is shown to be controlled by the $\varphi$-dependence between consecutive $θ_t$. An optimistic algorithm, called LinMix-UCB, is proposed for the case where $θ_t$ has an exponential mixing rate. The proposed algorithm is shown to incur a sub-linear regret of $\mathcal{O}\left(\sqrt{d n\mathrm{polylog}(n) }\right)$ with respect to an oracle that always plays a multiple of $\mathbb{E}θ_t$. The main challenge in this setting is to ensure that the exploration-exploitation strategy is robust against long-range dependencies. The proposed method relies on Berbee's coupling lemma to carefully select near-independent samples and construct confidence ellipsoids around empirical estimates of $\mathbb{E}θ_t$.

Restless Linear Bandits

TL;DR

This work generalizes linear bandits to a restless setting where the payoff-generating parameter sequence is stationary and φ-mixing, yielding time-dependent rewards . It quantifies the cost of replacing a dynamic restless oracle with a static mean oracle via the bound and introduces LinMix-UCB, an optimistic algorithm that handles long-range dependencies under an exponential mixing rate . LinMix-UCB achieves sublinear regret with respect to an oracle that always plays a multiple of , with finite-horizon guarantees of the form and infinite-horizon guarantees via a doubling trick. The analysis relies on Berbee's coupling to generate near-independent samples and on confidence ellipsoids around , bridging restless bandits and time-series concentration methods. The work opens avenues to relax the mixing assumption, learn mixing parameters online, and establish corresponding lower bounds.

Abstract

A more general formulation of the linear bandit problem is considered to allow for dependencies over time. Specifically, it is assumed that there exists an unknown -valued stationary -mixing sequence of parameters which gives rise to pay-offs. This instance of the problem can be viewed as a generalization of both the classical linear bandits with iid noise, and the finite-armed restless bandits. In light of the well-known computational hardness of optimal policies for restless bandits, an approximation is proposed whose error is shown to be controlled by the -dependence between consecutive . An optimistic algorithm, called LinMix-UCB, is proposed for the case where has an exponential mixing rate. The proposed algorithm is shown to incur a sub-linear regret of with respect to an oracle that always plays a multiple of . The main challenge in this setting is to ensure that the exploration-exploitation strategy is robust against long-range dependencies. The proposed method relies on Berbee's coupling lemma to carefully select near-independent samples and construct confidence ellipsoids around empirical estimates of .
Paper Structure (4 sections, 3 theorems, 61 equations, 2 algorithms)

This paper contains 4 sections, 3 theorems, 61 equations, 2 algorithms.

Key Result

Proposition 1

Let $\varphi_1$ be the first $\varphi$-mixing coefficient of the process $(\theta_t,~ t \in \mathbb{N})$. For every $n\geq 1$ it holds that

Theorems & Definitions (6)

  • Proposition 1
  • proof
  • Theorem 1
  • proof
  • Theorem 2
  • proof