Table of Contents
Fetching ...

Quantum circuit optimization with deep reinforcement learning

Thomas Fösel, Murphy Yuezhen Niu, Florian Marquardt, Li Li

TL;DR

The paper tackles hardware-aware quantum circuit optimization for NISQ devices by introducing a deep reinforcement learning framework that treats circuit transformations as actions in an RL environment. The agent is a deep convolutional network trained with PPO (AAC) to select soft transformations, followed by pruning of hard transformations, guided by a reward function that penalizes circuit depth and gate count. On 12-qubit random circuits, the method achieves about 27% depth and 15% gate-count reductions and demonstrates extrapolation to larger circuits (up to 50 qubits) and application to QAOA-MaxCut. Compared with simulated annealing, RL offers faster optimization after training and can generalize to architectures not seen during training, suggesting a practical path to hardware-aware QCO for near-term quantum devices.

Abstract

A central aspect for operating future quantum computers is quantum circuit optimization, i.e., the search for efficient realizations of quantum algorithms given the device capabilities. In recent years, powerful approaches have been developed which focus on optimizing the high-level circuit structure. However, these approaches do not consider and thus cannot optimize for the hardware details of the quantum architecture, which is especially important for near-term devices. To address this point, we present an approach to quantum circuit optimization based on reinforcement learning. We demonstrate how an agent, realized by a deep convolutional neural network, can autonomously learn generic strategies to optimize arbitrary circuits on a specific architecture, where the optimization target can be chosen freely by the user. We demonstrate the feasibility of this approach by training agents on 12-qubit random circuits, where we find on average a depth reduction by 27% and a gate count reduction by 15%. We examine the extrapolation to larger circuits than used for training, and envision how this approach can be utilized for near-term quantum devices.

Quantum circuit optimization with deep reinforcement learning

TL;DR

The paper tackles hardware-aware quantum circuit optimization for NISQ devices by introducing a deep reinforcement learning framework that treats circuit transformations as actions in an RL environment. The agent is a deep convolutional network trained with PPO (AAC) to select soft transformations, followed by pruning of hard transformations, guided by a reward function that penalizes circuit depth and gate count. On 12-qubit random circuits, the method achieves about 27% depth and 15% gate-count reductions and demonstrates extrapolation to larger circuits (up to 50 qubits) and application to QAOA-MaxCut. Compared with simulated annealing, RL offers faster optimization after training and can generalize to architectures not seen during training, suggesting a practical path to hardware-aware QCO for near-term quantum devices.

Abstract

A central aspect for operating future quantum computers is quantum circuit optimization, i.e., the search for efficient realizations of quantum algorithms given the device capabilities. In recent years, powerful approaches have been developed which focus on optimizing the high-level circuit structure. However, these approaches do not consider and thus cannot optimize for the hardware details of the quantum architecture, which is especially important for near-term devices. To address this point, we present an approach to quantum circuit optimization based on reinforcement learning. We demonstrate how an agent, realized by a deep convolutional neural network, can autonomously learn generic strategies to optimize arbitrary circuits on a specific architecture, where the optimization target can be chosen freely by the user. We demonstrate the feasibility of this approach by training agents on 12-qubit random circuits, where we find on average a depth reduction by 27% and a gate count reduction by 15%. We examine the extrapolation to larger circuits than used for training, and envision how this approach can be utilized for near-term quantum devices.

Paper Structure

This paper contains 25 sections, 5 equations, 5 figures.

Figures (5)

  • Figure 1: Overview. a) Diagram representation for quantum circuits. Each qubit is indicated with one line. The colored symbols represent operations (gates) on these qubits, with time increasing to the right. b) Quantum circuit optimization. For a given circuit, we aim to find a logically equivalent, but more efficient representation. c) Our reinforcement learning approach to quantum circuit optimization. Based on a diagram-like representation of the circuit, the agent, realized by a neural network, can choose between several circuit transformations to generate another, logically equivalent circuit; this process is repeated multiple times.
  • Figure 2: Deep convolutional network architecture of our RL agent. As observation, the agent receives a complete description of the state $s$, ie abbrevabbrev. akaa. k. a. ADA. D. BCB. C. caca. cfcf. chch. cmpcmp. dofd. o. f. ege. g. et_alet al. etcetc. figfig. iei. e. pp. secsec. sts. t. vsvs. wlogw. l. o. g. wrtw. r. t. [] , the quantum circuit. The input neurons are arranged on a 3D grid, whose axes correspond to qubit index, moment and gate class. This information is processed through a stack of multiple convolutional layers, where qubit index and moment are treated as spatial dimensions and the gate classes as input color channels. For the output, the agent computes two quantities: (i) The policy $\pi(s|a)$, according to which the actions $a$ in state $s$ are probabilistically chosen. Every action, ie abbrevabbrev. akaa. k. a. ADA. D. BCB. C. caca. cfcf. chch. cmpcmp. dofd. o. f. ege. g. et_alet al. etcetc. figfig. iei. e. pp. secsec. sts. t. vsvs. wlogw. l. o. g. wrtw. r. t. [] , circuit transformation, is mapped uniquely to one policy output neuron; the remaining neurons are disabled with an action mask. And (ii), the state value $V(s)$, which helps to update the policy $\pi(s|a)$ more efficiently during training. For us, $V(s)$ has the meaning of the optimization potential for the circuit.
  • Figure 3: Training on random circuits. a) Circuit processing pipeline (see \ref{['sec:results:rand_circs']} for details). After choosing an initial circuit by randomly combining gates, a pruning step follows where all "trivial" optimizations are applied. Afterwards, $500$ random transformations are performed on this circuit, which turns out to significantly increase their depth $d$ and gate count $n$. These expanded circuits are then used as the starting point of the episodes to train and evaluate the RL agent. b) Diagrams illustrating the evolution of one example circuit through this pipeline. c) Learning progress during training, demonstrating how the agent improves in reducing both the depth $d$ (top) and the gate count $n$ (bottom) of the circuits. The point cloud indicates, for all episodes during training, the corresponding quantity in the final time step. The blue curve shows the moving average over the latest $10\%$ of epochs. For comparison, the gray line indicates the corresponding averages after pruning; already early in the training, the agent falls below this level for both quantities. d) In-game progress at the end of the learning process, showing for $5$ episodes during the last epoch (orange) how the agent progressively optimizes $(d,n)$ during an episode. The blue curve indicates the average over all episodes in the last $100$ epochs of training. e) Relative improvement achieved by the RL agent, in reference to the corresponding circuit size after pruning. Each point corresponds to one episode during the last $100$ epochs. f) Comparison with circuit optimization by simulated annealing (see \ref{['sec:results:rand_circs']} for details). The graphical depiction and the considered circuits are equivalent to (e), which makes them directly comparable.
  • Figure 4: Extrapolation to $50$-qubit random circuits. The agent has been trained on $12$-qubit circuits ( cmp abbrevabbrev. akaa. k. a. ADA. D. BCB. C. caca. cfcf. chch. cmpcmp. dofd. o. f. ege. g. et_alet al. etcetc. figfig. iei. e. pp. secsec. sts. t. vsvs. wlogw. l. o. g. wrtw. r. t. [] \ref{['fig:results_small_rand_circs']}), no further learning updates are performed here. (a) shows the comparison between an unoptimized example circuit (after pruning) and the result of the optimization by the RL agent. (b) shows the progress of the agent in reducing depth $d$ and gate count $n$ over the course of $2500$ transformations. (c) shows the corresponding curves for simulated annealing, which requires almost $100000$ transformations to achieve a comparable degree of optimization (the computation was terminated after 1 week, at transformation $93000$).
  • Figure 5: Optimization of QAOA-MaxCut circuits. (a) indicates how to translate the MaxCut problem for a graph into a quantum circuit following QAOA, and how to efficiently compile this logical circuit into our gate set. We display one of $M$ cycles which form the full circuit, each with a different set of parameters $(\gamma_c,\beta_c)$ whose values are refined during the QAOA algorithm. (b) shows the compiled circuit for $C=2$ cycles and an all-to-all-connected graph with $6$ nodes, which has depth $d=75$ and gate count $n=142$ (top). Using a generic agent trained on random circuits as in \ref{['fig:results_small_rand_circs']}, we find (by postselection) improved circuits with $d=68$ and $n=138$ (middle). A specialized agent trained on this particular circuit can further optimize it to $d=66$ and $n=138$ (bottom).