Table of Contents
Fetching ...

Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes

Vishesh Mittal, Rahul Meshram, Surya Prakash

TL;DR

The Whittle index learning algorithm with Q-Iearning for restless multi-armed bandits is studied and it is illustrated that index learning with Q learning DQN and function approximations learns the Whittle index.

Abstract

We study the Whittle index learning algorithm for restless multi-armed bandits. We consider index learning algorithm with Q-learning. We first present Q-learning algorithm with exploration policies -- epsilon-greedy, softmax, epsilon-softmax with constant stepsizes. We extend the study of Q-learning to index learning for single-armed restless bandit. The algorithm of index learning is two-timescale variant of stochastic approximation, on slower timescale we update index learning scheme and on faster timescale we update Q-learning assuming fixed index value. In Q-learning updates are in asynchronous manner. We study constant stepsizes two timescale stochastic approximation algorithm. We provide analysis of two-timescale stochastic approximation for index learning with constant stepsizes. Further, we present study on index learning with deep Q-network (DQN) learning and linear function approximation with state-aggregation method. We describe the performance of our algorithms using numerical examples. We have shown that index learning with Q learning, DQN and function approximations learns the Whittle index.

Whittle Index Learning Algorithms for Restless Bandits with Constant Stepsizes

TL;DR

The Whittle index learning algorithm with Q-Iearning for restless multi-armed bandits is studied and it is illustrated that index learning with Q learning DQN and function approximations learns the Whittle index.

Abstract

We study the Whittle index learning algorithm for restless multi-armed bandits. We consider index learning algorithm with Q-learning. We first present Q-learning algorithm with exploration policies -- epsilon-greedy, softmax, epsilon-softmax with constant stepsizes. We extend the study of Q-learning to index learning for single-armed restless bandit. The algorithm of index learning is two-timescale variant of stochastic approximation, on slower timescale we update index learning scheme and on faster timescale we update Q-learning assuming fixed index value. In Q-learning updates are in asynchronous manner. We study constant stepsizes two timescale stochastic approximation algorithm. We provide analysis of two-timescale stochastic approximation for index learning with constant stepsizes. Further, we present study on index learning with deep Q-network (DQN) learning and linear function approximation with state-aggregation method. We describe the performance of our algorithms using numerical examples. We have shown that index learning with Q learning, DQN and function approximations learns the Whittle index.
Paper Structure (40 sections, 1 theorem, 36 equations, 20 figures, 2 algorithms)

This paper contains 40 sections, 1 theorem, 36 equations, 20 figures, 2 algorithms.

Key Result

Lemma 1

Suppose Then $Q_n(s,a) \rightarrow Q^*(s,a)$ for all $(s,a)$ almost surely.

Figures (20)

  • Figure 1: Q-learning: Example with one step random walk and number of states $K=25$ without and with re-initialization
  • Figure 2: Index learning using Q learning: Example with One step random walk with $K=25$
  • Figure 3: index learning with DQN algorithm Example: one step random walk $K=5$ with re-intialization
  • Figure 4: Linear function approximation: Example of one-step random walk with $K = 500$ and re-initialization
  • Figure 5: Q-learning: Example with circular dynamic model
  • ...and 15 more figures

Theorems & Definitions (1)

  • Lemma 1