Table of Contents
Fetching ...

Contextual Restless Multi-Armed Bandits with Application to Demand Response Decision-Making

Xin Chen, I-Hong Hou

TL;DR

A novel multi-armed bandits framework that can model both the internal state transitions of each arm and the influence of external global environmental contexts for complex online decision-making in smart grids is introduced.

Abstract

This paper introduces a novel multi-armed bandits framework, termed Contextual Restless Bandits (CRB), for complex online decision-making. This CRB framework incorporates the core features of contextual bandits and restless bandits, so that it can model both the internal state transitions of each arm and the influence of external global environmental contexts. Using the dual decomposition method, we develop a scalable index policy algorithm for solving the CRB problem, and theoretically analyze the asymptotical optimality of this algorithm. In the case when the arm models are unknown, we further propose a model-based online learning algorithm based on the index policy to learn the arm models and make decisions simultaneously. Furthermore, we apply the proposed CRB framework and the index policy algorithm specifically to the demand response decision-making problem in smart grids. The numerical simulations demonstrate the performance and efficiency of our proposed CRB approaches.

Contextual Restless Multi-Armed Bandits with Application to Demand Response Decision-Making

TL;DR

A novel multi-armed bandits framework that can model both the internal state transitions of each arm and the influence of external global environmental contexts for complex online decision-making in smart grids is introduced.

Abstract

This paper introduces a novel multi-armed bandits framework, termed Contextual Restless Bandits (CRB), for complex online decision-making. This CRB framework incorporates the core features of contextual bandits and restless bandits, so that it can model both the internal state transitions of each arm and the influence of external global environmental contexts. Using the dual decomposition method, we develop a scalable index policy algorithm for solving the CRB problem, and theoretically analyze the asymptotical optimality of this algorithm. In the case when the arm models are unknown, we further propose a model-based online learning algorithm based on the index policy to learn the arm models and make decisions simultaneously. Furthermore, we apply the proposed CRB framework and the index policy algorithm specifically to the demand response decision-making problem in smart grids. The numerical simulations demonstrate the performance and efficiency of our proposed CRB approaches.
Paper Structure (17 sections, 2 theorems, 29 equations, 3 figures, 3 algorithms)

This paper contains 17 sections, 2 theorems, 29 equations, 3 figures, 3 algorithms.

Key Result

Lemma 1

Given the initial global context $g_0 = g$ and suppose that the initial state $s_{i,0}$ of each arm $i\in [N]$ is chosen independently with the distribution $\mathbb{P}(s_{i,0}=s) = m^*_g(s)$, then, under the policy $\pi^*_{\mathrm{Rel}}$, we have

Figures (3)

  • Figure 1: Convergence of the Lagrange multiplier $\bm{\lambda}\!:=\!(\lambda_g)_{g\in\mathcal{G}}$ with the dual decomposition method.
  • Figure 2: Comparison between the per-user reward $V^N_{\mathrm{Rel}}/N$ and $V^N_{\mathrm{Ind}}/N$ of the Relaxed problem \ref{['eq:relax']} and of the index policy (Algorithm \ref{['alg:index_policy']}).
  • Figure 3: Comparison of the total discounted rewards between the CRB method and the traditional restless bandits method.

Theorems & Definitions (2)

  • Lemma 1
  • Theorem 1