Table of Contents
Fetching ...

Continuous-time multi-armed bandits under random intervention times

Kei Noba, José Luis Pérez, Kazutoshi Yamazaki, Qingyuan Zhang

Abstract

This paper examines multi-armed bandits in which actions are taken at random discrete times. The model consists of $J$ independent arms. When an arm is operated, it must remain active for a random duration, modeled by the inter-arrival time of a (possibly arm-dependent) renewal process. For arms evolving as a Lévy process, we provide an explicit characterization of the Gittins index, which is known to yield an optimal strategy. Furthermore, when the inter-arrival times are exponential and the arms evolve as either a spectrally negative Lévy process, a reflected spectrally negative Lévy process, or a diffusion process, the Gittins index is explicitly characterized in terms of the scale function or diffusion characteristics, respectively. Numerical experiments are performed to support the theoretical results.

Continuous-time multi-armed bandits under random intervention times

Abstract

This paper examines multi-armed bandits in which actions are taken at random discrete times. The model consists of independent arms. When an arm is operated, it must remain active for a random duration, modeled by the inter-arrival time of a (possibly arm-dependent) renewal process. For arms evolving as a Lévy process, we provide an explicit characterization of the Gittins index, which is known to yield an optimal strategy. Furthermore, when the inter-arrival times are exponential and the arms evolve as either a spectrally negative Lévy process, a reflected spectrally negative Lévy process, or a diffusion process, the Gittins index is explicitly characterized in terms of the scale function or diffusion characteristics, respectively. Numerical experiments are performed to support the theoretical results.
Paper Structure (23 sections, 12 theorems, 138 equations, 1 figure, 3 tables, 1 algorithm)

This paper contains 23 sections, 12 theorems, 138 equations, 1 figure, 3 tables, 1 algorithm.

Key Result

Theorem 2.1

The Gittins index strategy, which at each onset of period $t \in \mathbb{N}_0$ operates arm is optimal.

Figures (1)

  • Figure 1: Convergence of the Gittins index. The red, blue, and green curves correspond to the continuous-time Gittins index, the Gittins indices of a RSNLP, and the Gittins indices of a SNLP, respectively.

Theorems & Definitions (23)

  • Remark 2.1
  • Theorem 2.1
  • proof
  • Proposition 2.1
  • proof
  • Remark 2.2
  • Remark 2.3
  • Lemma 3.1
  • Proposition 3.1
  • proof
  • ...and 13 more