Curriculum reinforcement learning for quantum architecture search under hardware errors

Yash J. Patel; Akash Kundu; Mateusz Ostaszewski; Xavier Bonet-Monroig; Vedran Dunjko; Onur Danaci

Curriculum reinforcement learning for quantum architecture search under hardware errors

Yash J. Patel, Akash Kundu, Mateusz Ostaszewski, Xavier Bonet-Monroig, Vedran Dunjko, Onur Danaci

TL;DR

This work tackles automatic quantum circuit architecture search for VQE on NISQ devices by introducing CRLQAS, a curriculum-based reinforcement learning framework that operates under realistic hardware noise. It combines a 3D tensor encoding of circuits, illegal-action pruning, random halting to promote compact circuits, and a novel Adam-SPSA optimizer, all powered by a fast PTM-based GPU simulator for efficient noisy evaluations. The method demonstrates state-of-the-art performance on quantum chemistry tasks (H2, LiH, H2O) across noiseless and noisy environments, achieving chemical accuracy with notably smaller gate counts and depths than previous QAS methods. The 6x-speedup from PTM-based simulations and the depth-aware encoding enable scalable training, highlighting CRLQAS’s potential for practical quantum architecture design in chemistry, optimization, and quantum machine learning, while outlining avenues for hardware validation and scalability improvements.

Abstract

The key challenge in the noisy intermediate-scale quantum era is finding useful circuits compatible with current device limitations. Variational quantum algorithms (VQAs) offer a potential solution by fixing the circuit architecture and optimizing individual gate parameters in an external loop. However, parameter optimization can become intractable, and the overall performance of the algorithm depends heavily on the initially chosen circuit architecture. Several quantum architecture search (QAS) algorithms have been developed to design useful circuit architectures automatically. In the case of parameter optimization alone, noise effects have been observed to dramatically influence the performance of the optimizer and final outcomes, which is a key line of study. However, the effects of noise on the architecture search, which could be just as critical, are poorly understood. This work addresses this gap by introducing a curriculum-based reinforcement learning QAS (CRLQAS) algorithm designed to tackle challenges in realistic VQA deployment. The algorithm incorporates (i) a 3D architecture encoding and restrictions on environment dynamics to explore the search space of possible circuits efficiently, (ii) an episode halting scheme to steer the agent to find shorter circuits, and (iii) a novel variant of simultaneous perturbation stochastic approximation as an optimizer for faster convergence. To facilitate studies, we developed an optimized simulator for our algorithm, significantly improving computational efficiency in simulating noisy quantum circuits by employing the Pauli-transfer matrix formalism in the Pauli-Liouville basis. Numerical experiments focusing on quantum chemistry tasks demonstrate that CRLQAS outperforms existing QAS algorithms across several metrics in both noiseless and noisy environments.

Curriculum reinforcement learning for quantum architecture search under hardware errors

TL;DR

Abstract

Paper Structure (38 sections, 20 equations, 11 figures, 6 tables, 1 algorithm)

This paper contains 38 sections, 20 equations, 11 figures, 6 tables, 1 algorithm.

Introduction
Related Work
Curriculum Reinforcement Learning Algorithm
Illegal actions for the RL agent
Random Halting of the RL environment
Tensor-based binary circuit encoding
Adam-SPSA Algorithm with Varying Samples
Fast GPU Simulation of Noisy Environments
Experiments
Noisy simulation
Noiseless simulation
Conclusion
Limitations and Future Work
Computational Demands
Evolution of RL Methods
...and 23 more sections

Figures (11)

Figure 1: Illustration of the architecture of the double deep-Q network utilized by the reinforcement learning (RL) agent. The RL state $s$ here describes the quantum circuit encoded as a tensor-based 3D grid whose axes correspond to the qubit index, depth (moment) and gate type. This information is processed through a multi-layer perceptron. For the output, the agent computes the policy, according to which the actions $a$ (as gates) in state $s$ are probabilistically chosen. A classical optimizer optimizes the circuit and, upon completion, provides a reward that guides the agent to select subsequent actions.
Figure 2: Achieving the chemical accuracy for $\ce{H2}$ (with $2$-, $3$- and $4$-qubits), and $\ce{LiH}$ (with $4$-qubits) molecules via a systematic study under realistic physical noise where the noise model mimics the $\texttt{IBM Quantum}$ devices. In the initial episodes, the probability of choosing random actions is very high, and to avoid this, we consider the statistics from $2000$ episodes onward and plot the median of the minimum over $3$ different seeds. The different colours denote the different levels of noise (increasing from bottom to top), and the patterns (from left to right) denote the number of parameters, the depth and the number of gates, respectively. We reach the chemical accuracy for $\ce{H2}-2$ and $\ce{H2}-3$ (except the $10$ times max noise of $\texttt{IBM Mumbai}$ device) molecule for all levels of noise. Meanwhile, $\ce{H2}-4$ molecule reaches the chemical accuracy with the noise profile of $\texttt{IBM Ourense}$ even with qubit-connectivity constraints. Finally, with $\ce{LiH}-4$, we achieve the chemical accuracy with shot and 1-qubit depolarizing noise. Note that, for $\ce{H2}$, we decreased the threshold (usually set to chemical accuracy) to $2.2\times 10^{-4}$ because the problem is straightforward to solve.
Figure 3: Learning curve of the $\ce{LiH}$ (with 4-qubits) experiment. The agent is trained with a noise model with $1$-qubit depolarizing noise of strength $0.1 \times 10^{-2}$, and sampling of the expectation values of $10^{6}$ repetitions. The left panel shows the training curve under noise, the right panel is the evaluation of the points on the left panel but without noise. The red dashed line indicates the chemical accuracy.
Figure 4: Demonstration of the feedback-driven (green) process, depicting the impact of two amortization occurrences (pink), denoted by $\delta$. The initial occurrence corresponds to a non-zero adjustment in the threshold, transitioning from $\xi_1$ to $\xi_2$, indicating the agent's success in enhancing the energy estimate during training. The subsequent amortization event illustrates the scenario where the agent falls short of improving upon the current threshold $\xi_2$ or the improvement is marginal compared to the amortization value. Consequently, the threshold undergoes a sudden increase due to the reset of the amortization value. It's noteworthy that the ultimate threshold, subsequent to the second amortization reaching zero, may also be less than $\xi_2$.
Figure 5: Illustration of tensor-based encoding for a $4$-qubit (i.e., $N = 4$) toy circuit with $n_{\text{act}} = 3$. We initialize a tensor of zeros of dimension $\left[n_{\text{act}} \times \left((N+3)\times N\right)\right]$, equating to $\left[3 \times \left((4 + 3) \times 4\right)\right]$ for this circuit. Each blue-colored matrix of size $((4 + 3) \times 4)$, represents a different moment (depth). Within this matrix, the first $(4 \times 4)$ block is reserved for CNOT (CX), with rows and columns encoding target and control qubits, respectively. The remaining $(3 \times 4)$ block of the blue-colored matrix then encodes rotation gates. The columns mark the position of the qubit wire (index) and the rows mark the rotation direction $m$. Here, $m = 1, 2, \text{and}\ 3$ yields the rotations RX, RY and RZ, respectively.
...and 6 more figures

Curriculum reinforcement learning for quantum architecture search under hardware errors

TL;DR

Abstract

Curriculum reinforcement learning for quantum architecture search under hardware errors

Authors

TL;DR

Abstract

Table of Contents

Figures (11)