Continuous-time multi-armed bandits under random intervention times

Kei Noba; José Luis Pérez; Kazutoshi Yamazaki; Qingyuan Zhang

Continuous-time multi-armed bandits under random intervention times

Kei Noba, José Luis Pérez, Kazutoshi Yamazaki, Qingyuan Zhang

Abstract

This paper examines multi-armed bandits in which actions are taken at random discrete times. The model consists of $J$ independent arms. When an arm is operated, it must remain active for a random duration, modeled by the inter-arrival time of a (possibly arm-dependent) renewal process. For arms evolving as a Lévy process, we provide an explicit characterization of the Gittins index, which is known to yield an optimal strategy. Furthermore, when the inter-arrival times are exponential and the arms evolve as either a spectrally negative Lévy process, a reflected spectrally negative Lévy process, or a diffusion process, the Gittins index is explicitly characterized in terms of the scale function or diffusion characteristics, respectively. Numerical experiments are performed to support the theoretical results.

Continuous-time multi-armed bandits under random intervention times

Abstract

This paper examines multi-armed bandits in which actions are taken at random discrete times. The model consists of

independent arms. When an arm is operated, it must remain active for a random duration, modeled by the inter-arrival time of a (possibly arm-dependent) renewal process. For arms evolving as a Lévy process, we provide an explicit characterization of the Gittins index, which is known to yield an optimal strategy. Furthermore, when the inter-arrival times are exponential and the arms evolve as either a spectrally negative Lévy process, a reflected spectrally negative Lévy process, or a diffusion process, the Gittins index is explicitly characterized in terms of the scale function or diffusion characteristics, respectively. Numerical experiments are performed to support the theoretical results.

Paper Structure (23 sections, 12 theorems, 138 equations, 1 figure, 3 tables, 1 algorithm)

This paper contains 23 sections, 12 theorems, 138 equations, 1 figure, 3 tables, 1 algorithm.

Introduction
Preliminaries
Model
Gittins index
Markovian case
Lévy process case
Exponential inter-arrival case
Lévy process case.
General case.
Reflected Lévy case.
Diffusion case.
Numerical experiments
Convergence.
Proof of results in section 3
Proof of Lemma \ref{['WF']}.
...and 8 more sections

Key Result

Theorem 2.1

The Gittins index strategy, which at each onset of period $t \in \mathbb{N}_0$ operates arm is optimal.

Figures (1)

Figure 1: Convergence of the Gittins index. The red, blue, and green curves correspond to the continuous-time Gittins index, the Gittins indices of a RSNLP, and the Gittins indices of a SNLP, respectively.

Theorems & Definitions (23)

Remark 2.1
Theorem 2.1
proof
Proposition 2.1
proof
Remark 2.2
Remark 2.3
Lemma 3.1
Proposition 3.1
proof
...and 13 more

Continuous-time multi-armed bandits under random intervention times

Abstract

Continuous-time multi-armed bandits under random intervention times

Authors

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (23)