Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis

Hao Li; Dong Liang; Zheng Xie

Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis

Hao Li, Dong Liang, Zheng Xie

TL;DR

The performance of Meta-TSLB is evaluated experimentally under different settings, and the generalization capability of Meta-TSLB is analyzed, showcasing its potential to adapt to unseen instances.

Abstract

Meta-learning is characterized by its ability to learn how to learn, enabling the adaptation of learning strategies across different tasks. Recent research introduced the Meta-Thompson Sampling (Meta-TS), which meta-learns an unknown prior distribution sampled from a meta-prior by interacting with bandit instances drawn from it. However, its analysis was limited to Gaussian bandit. The contextual multi-armed bandit framework is an extension of the Gaussian Bandit, which challenges agent to utilize context vectors to predict the most valuable arms, optimally balancing exploration and exploitation to minimize regret over time. This paper introduces Meta-TSLB algorithm, a modified Meta-TS for linear contextual bandits. We theoretically analyze Meta-TSLB and derive an $ O((m+\log(m))\sqrt{n\log(n)})$ bound on its Bayes regret, in which $m$ represents the number of bandit instances, and $n$ the number of rounds of Thompson Sampling. Additionally, our work complements the analysis of Meta-TS for linear contextual bandits. The performance of Meta-TSLB is evaluated experimentally under different settings, and we experimente and analyze the generalization capability of Meta-TSLB, showcasing its potential to adapt to unseen instances.

Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis

TL;DR

The performance of Meta-TSLB is evaluated experimentally under different settings, and the generalization capability of Meta-TSLB is analyzed, showcasing its potential to adapt to unseen instances.

Abstract

bound on its Bayes regret, in which

represents the number of bandit instances, and

the number of rounds of Thompson Sampling. Additionally, our work complements the analysis of Meta-TS for linear contextual bandits. The performance of Meta-TSLB is evaluated experimentally under different settings, and we experimente and analyze the generalization capability of Meta-TSLB, showcasing its potential to adapt to unseen instances.

Paper Structure (20 sections, 8 theorems, 66 equations, 9 figures, 3 algorithms)

This paper contains 20 sections, 8 theorems, 66 equations, 9 figures, 3 algorithms.

Introduction
Problem Statement
Modified Meta-Thompson Sampling for Linear Bandits
Regret Bound of Meta-TSLB
Key Lemmas
Regret Analysis on Meta-TSLB
Regret Analysis on Meta-TS applied to Linear Bandits
Extended version of Linear Bandits
Linear Bandits with Finite Potential Instance Priors
Linear Bandits with Infinite Arms
Sequential Linear Bandits
Experiment
Conclusion
Preliminary Lemmas
Proof of Lemma \ref{['Qs']}
...and 5 more sections

Key Result

Lemma 1

In task $s$, the pulled arm in round $t$ of TS is $A_t$ and $I_d$ is $d$-dimensional unit matrix. Then the meta-posterior in task $s+1$ is $Q_{s+1}=\mathcal{N}(\boldsymbol{\mu }_{Q,s+1},v^2 \varSigma _{Q,s+1} )$, where $\varSigma _{Q,s+1}=\left[ \varSigma _{Q,s}^{-1}+\varSigma _{*}^{-1}-\left( \varS

Figures (9)

Figure 1: A sequential linear bandit instance with $p=3$
Figure 2: Linear bandits
Figure 3: Linear bandits with finite potential instance priors
Figure 4: Linear bandits with infinite arms
Figure 5: Sequential linear bandits
...and 4 more figures

Theorems & Definitions (15)

Lemma 1
Lemma 2
Lemma 3
Lemma 4
Theorem 1
Lemma 5
Theorem 2
Lemma 6
proof
proof
...and 5 more

Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis

TL;DR

Abstract

Modified Meta-Thompson Sampling for Linear Bandits and Its Bayes Regret Analysis

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (9)

Theorems & Definitions (15)