Table of Contents
Fetching ...

Fast Online Learning with Gaussian Prior-Driven Hierarchical Unimodal Thompson Sampling

Tianchi Zhao, He Liu, Hongyin Shi, Jinliang Li

TL;DR

This work proposes a Thompson Sampling with Clustered arms under Gaussian prior (TSCG) specific to the 2-level hierarchical structure and proves that by utilizing the 2-level structure, it can achieve a lower regret bound than with ordinary TSG.

Abstract

We study a type of Multi-Armed Bandit (MAB) problems in which arms with a Gaussian reward feedback are clustered. Such an arm setting finds applications in many real-world problems, for example, mmWave communications and portfolio management with risky assets, as a result of the universality of the Gaussian distribution. Based on the Thompson Sampling algorithm with Gaussian prior (TSG) algorithm for the selection of the optimal arm, we propose our Thompson Sampling with Clustered arms under Gaussian prior (TSCG) specific to the 2-level hierarchical structure. We prove that by utilizing the 2-level structure, we can achieve a lower regret bound than we do with ordinary TSG. In addition, when the reward is Unimodal, we can reach an even lower bound on the regret by our Unimodal Thompson Sampling algorithm with Clustered Arms under Gaussian prior (UTSCG). Each of our proposed algorithms are accompanied by theoretical evaluation of the upper regret bound, and our numerical experiments confirm the advantage of our proposed algorithms.

Fast Online Learning with Gaussian Prior-Driven Hierarchical Unimodal Thompson Sampling

TL;DR

This work proposes a Thompson Sampling with Clustered arms under Gaussian prior (TSCG) specific to the 2-level hierarchical structure and proves that by utilizing the 2-level structure, it can achieve a lower regret bound than with ordinary TSG.

Abstract

We study a type of Multi-Armed Bandit (MAB) problems in which arms with a Gaussian reward feedback are clustered. Such an arm setting finds applications in many real-world problems, for example, mmWave communications and portfolio management with risky assets, as a result of the universality of the Gaussian distribution. Based on the Thompson Sampling algorithm with Gaussian prior (TSG) algorithm for the selection of the optimal arm, we propose our Thompson Sampling with Clustered arms under Gaussian prior (TSCG) specific to the 2-level hierarchical structure. We prove that by utilizing the 2-level structure, we can achieve a lower regret bound than we do with ordinary TSG. In addition, when the reward is Unimodal, we can reach an even lower bound on the regret by our Unimodal Thompson Sampling algorithm with Clustered Arms under Gaussian prior (UTSCG). Each of our proposed algorithms are accompanied by theoretical evaluation of the upper regret bound, and our numerical experiments confirm the advantage of our proposed algorithms.
Paper Structure (24 sections, 39 equations, 7 figures, 3 tables, 3 algorithms)

This paper contains 24 sections, 39 equations, 7 figures, 3 tables, 3 algorithms.

Figures (7)

  • Figure 1: Example of mmWave communications. There are three communication frequencies: $f_1=24.25$ GHz, $f_2=43.5$ GHz, and $f_3=60$ GHz. For each communication frequency, there are three beams to be selected.
  • Figure 2: Example of Gaussian returns. The histogram in blue is synthesized according to statistics in S&P 500 index in Gaussian_SPX and the one in red refers to CSI 300 statistics in GaussReturns.
  • Figure 3: Three Unimodal configurations are presented. The vertical bar indicates the standard deviation. There can be either one unique maximum in the expected regret, or the mean monotonically decreases or increases with the arm labeling in the cluster.
  • Figure 4: Cumulative regret.
  • Figure 5: Cumulative selection of the true optimal.
  • ...and 2 more figures