Table of Contents
Fetching ...

Quantum spatial best-arm identification via quantum walks

Tomoki Yamagami, Etsuo Segawa, Takatomo Mihana, André Röhm, Atsushi Uchida, Ryoichi Horisaki

TL;DR

A quantum algorithmic framework for best-arm identification in graph bandits, termed Quantum Spatial Best-Arm Identification (QSBAI), which is applicable to general graph structures and establishes a link between Grover-type search and reinforcement learning tasks with structural restrictions.

Abstract

Quantum reinforcement learning has emerged as a framework combining quantum computation with sequential decision-making, and applications to the multi-armed bandit (MAB) problem have been reported. The graph bandit problem extends the MAB setting by introducing spatial constraints, yet quantum approaches remain limited. We propose a quantum algorithmic framework for best-arm identification in graph bandits, termed Quantum Spatial Best-Arm Identification (QSBAI), which is applicable to general graph structures. The method employs quantum walks to encode superpositions over graph-constrained actions, extending amplitude amplification and generalizing the Quantum BAI algorithm via Szegedy's walk framework. This establishes a link between Grover-type search and reinforcement learning tasks with structural restrictions. We focus our theoretical analysis on complete and bipartite graphs, deriving the maximal success probability of identifying the best arm and the time step at which it is achieved. Our results highlight the potential of quantum walks to accelerate exploration in constrained environments and extend the applicability of quantum algorithms for decision-making.

Quantum spatial best-arm identification via quantum walks

TL;DR

A quantum algorithmic framework for best-arm identification in graph bandits, termed Quantum Spatial Best-Arm Identification (QSBAI), which is applicable to general graph structures and establishes a link between Grover-type search and reinforcement learning tasks with structural restrictions.

Abstract

Quantum reinforcement learning has emerged as a framework combining quantum computation with sequential decision-making, and applications to the multi-armed bandit (MAB) problem have been reported. The graph bandit problem extends the MAB setting by introducing spatial constraints, yet quantum approaches remain limited. We propose a quantum algorithmic framework for best-arm identification in graph bandits, termed Quantum Spatial Best-Arm Identification (QSBAI), which is applicable to general graph structures. The method employs quantum walks to encode superpositions over graph-constrained actions, extending amplitude amplification and generalizing the Quantum BAI algorithm via Szegedy's walk framework. This establishes a link between Grover-type search and reinforcement learning tasks with structural restrictions. We focus our theoretical analysis on complete and bipartite graphs, deriving the maximal success probability of identifying the best arm and the time step at which it is achieved. Our results highlight the potential of quantum walks to accelerate exploration in constrained environments and extend the applicability of quantum algorithms for decision-making.

Paper Structure

This paper contains 14 sections, 2 theorems, 85 equations, 9 figures, 1 algorithm.

Key Result

Theorem 4.1

Let Then, for $s = \lfloor\pi /4\theta \rfloor$, which implies $s = \mathrm{O}(1/\sqrt{\overline{q}})$, the recommendation probability for the best arm $v^*$ at time $t$, denoted by $P_{t}(v^*)$, is maximized at $t=2s$, and satisfies

Figures (9)

  • Figure 1: Conceptual illustration of a multi-armed bandit (MAB) problem. The MAB framework consists of interactions between an agent and an environment containing multiple arms (slot machines). The agent selects an arm, which then generates a reward probabilistically. Based on the observed reward, the agent updates its strategy and selects an arm again.
  • Figure 2: Positioning of Szegedy's walk. Grover's search algorithm has two direction of generalization: regarding the initial state and spatial constraints. The former and latter are called quantum amplitude amplification and Grover's walk, respectively. Szegedy's walk is positioned on the intersection of these two ways of generalizing Grover's walk.
  • Figure 3: Example of arm selection on a graph $G = (V,\,A)$ with $V = \{0,\,1,\,2,\,3,\,4\}$ and $A = \{a,\,a^{-1}\,|\,a\in \{(0,\,1),\,(0,\,2),\,(0,\,3),\,(1,\,2),\,(3,\,4)\}\}$. Suppose the last selected arm is $0$. The next arm is then chosen uniformly at random from its three neighboring arms, $1$, $2$, and $3$, each with probability $1/3$. Since arm $4$ is not adjacent to arm $0$, the agent cannot select it in the next decision.
  • Figure 4: Possible state transitions in arm selection and environment states, corresponding to Fig. \ref{['qsbai:fig:example']}. Each state is represented by a pair $(v,\,\sigma)$, where $v \in V$ is the selected arm and $\sigma \in \Sigma = \{\sigma,\,\tau\}$ is the environment state. Suppose the current state is $(0,\,\sigma)$. The next arm is chosen uniformly at random from its three neighbors $1$, $2$, and $3$, each with probability $1/3$, as shown in Fig. \ref{['qsbai:fig:example']}. Afterward, the subsequent environment state is determined according to the distribution $\eta_v$ associated with the selected arm $v \in V$. For example, the probability of transitioning from $(0,\,\sigma)$ to $(1,\,\sigma)$ is the product of the arm-selection probability $1/3$ and the environment-transition probability $\eta_1(\sigma)$. The probabilities of other transitions are determined in the same manner.
  • Figure 5: Example of constructing the executive graph $\widetilde{G}$ for the graph $G$ shown in Figs. \ref{['qsbai:fig:example']} and \ref{['qsbai:fig:transition']}. The executive graph $\widetilde{G}$ represents possible state transitions in arm selection on $G$, where each state is expressed as a pair consisting of a selected arm and an environment state. Assuming that the environment state at each time step is determined independently, $\widetilde{G}$ is formalized as the direct product of the graph $G$ and the complete graph with self-loops indexed by the set $\Sigma$ of possible environment states, see Eq. \ref{['qsbai:eq:Gtilde']}.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Theorem 4.1
  • Theorem 5.1