Table of Contents
Fetching ...

Distributed quantum architecture search using multi-agent reinforcement learning

Mikhail Sergeev, Georgii Paradezhenko, Daniil Rabinovich, Vladimir V. Palyulin

TL;DR

This work tackles the scalability challenge of quantum architecture search for variational quantum algorithms by introducing MARL-QAS, a multi-agent reinforcement learning framework built on QMIX. By partitioning a quantum circuit into subcircuits controlled by independent agents, trained jointly to maximize a shared reward, the approach aligns naturally with distributed quantum computing and enables efficient exploration of large circuit spaces. Empirical results on Max-Cut for 3-regular graphs and the Schwinger model show that MARL-QAS can achieve comparable or better problem performance while significantly reducing two-qubit gate counts and parameter counts, and it accelerates training as the number of agents increases. The proposed method offers practical advantages for implementing QAS on near-term devices, reduces quantum-cost overhead, and supports distributed execution in multi-processor quantum architectures, with code and data publicly available.

Abstract

Quantum architecture search (QAS) automates the design of parameterized quantum circuits for variational quantum algorithms. The framework finds a well-suited problem-specific structure of a variational ansatz. Among possible implementations of QAS the reinforcement learning (RL) stands out as one of the most promising. Current RL approaches are single-agent-based and show poor scalability with a number of qubits due to the increase of the action space dimension and the computational cost. We propose a novel multi-agent RL algorithm for QAS with each agent acting separately on its own block of a quantum circuit. This procedure allows to significantly accelerate the convergence of the RL-based QAS and reduce its computational cost. We benchmark the proposed algorithm on MaxCut problem on 3-regular graphs and on ground energy estimation for the Schwinger Hamiltonian. In addition, the proposed multi-agent approach naturally fits into the set-up of distributed quantum computing, favoring its implementation on modern intermediate scale quantum devices.

Distributed quantum architecture search using multi-agent reinforcement learning

TL;DR

This work tackles the scalability challenge of quantum architecture search for variational quantum algorithms by introducing MARL-QAS, a multi-agent reinforcement learning framework built on QMIX. By partitioning a quantum circuit into subcircuits controlled by independent agents, trained jointly to maximize a shared reward, the approach aligns naturally with distributed quantum computing and enables efficient exploration of large circuit spaces. Empirical results on Max-Cut for 3-regular graphs and the Schwinger model show that MARL-QAS can achieve comparable or better problem performance while significantly reducing two-qubit gate counts and parameter counts, and it accelerates training as the number of agents increases. The proposed method offers practical advantages for implementing QAS on near-term devices, reduces quantum-cost overhead, and supports distributed execution in multi-processor quantum architectures, with code and data publicly available.

Abstract

Quantum architecture search (QAS) automates the design of parameterized quantum circuits for variational quantum algorithms. The framework finds a well-suited problem-specific structure of a variational ansatz. Among possible implementations of QAS the reinforcement learning (RL) stands out as one of the most promising. Current RL approaches are single-agent-based and show poor scalability with a number of qubits due to the increase of the action space dimension and the computational cost. We propose a novel multi-agent RL algorithm for QAS with each agent acting separately on its own block of a quantum circuit. This procedure allows to significantly accelerate the convergence of the RL-based QAS and reduce its computational cost. We benchmark the proposed algorithm on MaxCut problem on 3-regular graphs and on ground energy estimation for the Schwinger Hamiltonian. In addition, the proposed multi-agent approach naturally fits into the set-up of distributed quantum computing, favoring its implementation on modern intermediate scale quantum devices.

Paper Structure

This paper contains 10 sections, 17 equations, 10 figures, 2 tables.

Figures (10)

  • Figure 1: Sketch of a training loop in basic RL. An agent receives the observation vector $o$ and reward $r$ from the environment. Based on this information, the agent applies an action on the environment and changes its state. The agent is trained to maximize the reward by updating its action policy $\pi$ according to a certain rule. In $Q$-learning algorithms, the agent updates its $Q^{\pi}(s,a)$ function given by Eq. \ref{['Q-function']}.
  • Figure 2: Forms of the agent-environment interaction in multi-agent reinforcement learning. In all cases, multiple agents receive an array of corresponding observations, $\textbf{o}=(o_1,o_2,...,o_n)$, and apply actions represented as an array $\textbf{a}=(a_1,a_2,...,a_n)$. The agents receive their rewards $r_1,r_2,...,r_n$ from the environment. These rewards can be joint for cooperative problems or separate otherwise. (a) Centralized training, centralized execution: agents can receive all information about other agents during the training and execution phases. (b) Centralized training, distributed execution: agents can share information during the training phase, but during the execution they act independently. (c): Distributed training, distributed execution: agents do not share any information during both the training and the execution phases.
  • Figure 3: (a) General structure of the QMIX network. This network consists of multiple agents networks connected to the single mixing network (MixNet). Each agent network receives the action-observation pair $(o^i_t, a_{t-1}^i)$ as an input and outputs the evaluated Q-function $Q_i(\tau^i,a^i_t)$, while the MixNet combines all these Q-values into a single total Q-function $Q_{tot}(\bm{\tau},\bm{a}_t)$ for the joint action $\bm{a}_t$. (b) Structures of the MixNet and agent network. The MixNet consists of two linear layers with a ReLU activation function between them. The weights and biases of MixNet are produced by separate hypernetworks with trainable weights that use the total state $s_t$ of the environment as an input. The agent network consists of two linear layers connected through a single gated recurrent unit (GRU).
  • Figure 4: Sketch of the MARL-QAS algorithm, which allows to distribute the action space between multiple RL agents. The agent-based execution model provides a potential for using the method in distributed quantum computing.
  • Figure 5: (a): Agent action-to-unitary transition. First, the gate is selected by pointwise division of $a$ by number of agent qubits $q$. The matching qubit index of gate application is selected by calculating the remainder of the same division. This routine is performed for each agent. (b): MARL training pipeline. RL algorithm produces a quantum circuit. Afterwards, the output circuit is assessed by computing the total statevector (or Hamiltonian expected value) and then by calculating the corresponding reward metric (fidelity $F$/approxim. ratio). According to the reward signal, the RL networks' weights are updated, for both mixing and agent networks.
  • ...and 5 more figures