Table of Contents
Fetching ...

Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis

Hao Li, Dong Liang, Zheng Xie

TL;DR

The performance of Meta-TSLB is evaluated experimentally under different settings, and the generalization capability of Meta-TSLB is analyzed, showcasing its potential to adapt to unseen instances.

Abstract

Meta-learning is characterized by its ability to learn how to learn, enabling the adaptation of learning strategies across different tasks. Recent research introduced the Meta-Thompson Sampling (Meta-TS), which meta-learns an unknown prior distribution sampled from a meta-prior by interacting with bandit instances drawn from it. However, its analysis was limited to Gaussian bandit. The contextual multi-armed bandit framework is an extension of the Gaussian Bandit, which challenges agent to utilize context vectors to predict the most valuable arms, optimally balancing exploration and exploitation to minimize regret over time. This paper introduces Meta-TSLB algorithm, a modified Meta-TS for linear contextual bandits. We theoretically analyze Meta-TSLB and derive an $ O((m+\log(m))\sqrt{n\log(n)})$ bound on its Bayes regret, in which $m$ represents the number of bandit instances, and $n$ the number of rounds of Thompson Sampling. Additionally, our work complements the analysis of Meta-TS for linear contextual bandits. The performance of Meta-TSLB is evaluated experimentally under different settings, and we experimente and analyze the generalization capability of Meta-TSLB, showcasing its potential to adapt to unseen instances.

Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis

TL;DR

The performance of Meta-TSLB is evaluated experimentally under different settings, and the generalization capability of Meta-TSLB is analyzed, showcasing its potential to adapt to unseen instances.

Abstract

Meta-learning is characterized by its ability to learn how to learn, enabling the adaptation of learning strategies across different tasks. Recent research introduced the Meta-Thompson Sampling (Meta-TS), which meta-learns an unknown prior distribution sampled from a meta-prior by interacting with bandit instances drawn from it. However, its analysis was limited to Gaussian bandit. The contextual multi-armed bandit framework is an extension of the Gaussian Bandit, which challenges agent to utilize context vectors to predict the most valuable arms, optimally balancing exploration and exploitation to minimize regret over time. This paper introduces Meta-TSLB algorithm, a modified Meta-TS for linear contextual bandits. We theoretically analyze Meta-TSLB and derive an bound on its Bayes regret, in which represents the number of bandit instances, and the number of rounds of Thompson Sampling. Additionally, our work complements the analysis of Meta-TS for linear contextual bandits. The performance of Meta-TSLB is evaluated experimentally under different settings, and we experimente and analyze the generalization capability of Meta-TSLB, showcasing its potential to adapt to unseen instances.
Paper Structure (20 sections, 8 theorems, 66 equations, 9 figures, 3 algorithms)

This paper contains 20 sections, 8 theorems, 66 equations, 9 figures, 3 algorithms.

Key Result

Lemma 1

In task $s$, the pulled arm in round $t$ of TS is $A_t$ and $I_d$ is $d$-dimensional unit matrix. Then the meta-posterior in task $s+1$ is $Q_{s+1}=\mathcal{N}(\boldsymbol{\mu }_{Q,s+1},v^2 \varSigma _{Q,s+1} )$, where $\varSigma _{Q,s+1}=\left[ \varSigma _{Q,s}^{-1}+\varSigma _{*}^{-1}-\left( \varS

Figures (9)

  • Figure 1: A sequential linear bandit instance with $p=3$
  • Figure 2: Linear bandits
  • Figure 3: Linear bandits with finite potential instance priors
  • Figure 4: Linear bandits with infinite arms
  • Figure 5: Sequential linear bandits
  • ...and 4 more figures

Theorems & Definitions (15)

  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Theorem 1
  • Lemma 5
  • Theorem 2
  • Lemma 6
  • proof
  • proof
  • ...and 5 more