Table of Contents
Fetching ...

Gate Sequence Optimization for Parameterized Quantum Circuits using Reinforcement Learning

Tom R. Rieckmann, Stefan Scheel, A. Douglas K. Plato

TL;DR

The paper tackles noise from entangling gates on NISQ devices by optimizing entangling gate sequences for parameterized quantum circuits using reinforcement learning. It introduces a Double Deep Q-Network framework that sequentially designs gate sequences and tunes continuous gate parameters with fidelity-based rewards, without requiring a priori optimal baselines. Compared with fixed-layered hardware-efficient ansätze, the RL approach achieves higher state-preparation fidelities with fewer CNOTs across multiple connectivities and device simulations, including IBM backends. This flexible, tunable method offers practical improvements for near-term quantum algorithms and can be extended to incorporate real-device noise data.

Abstract

Current experimental quantum computing devices are limited by noise, mainly originating from entangling gates. If an efficient gate sequence for an operation is unknown, one often employs layered parameterized quantum circuits, especially hardware-efficient ansätze, with fixed entangling layer structures. We demonstrate a reinforcement learning algorithm to improve on these by optimizing the entangling gate sequence in the task of quantum state preparation. This allows us to restrict the required number of CNOT gates while taking the qubit connectivity architecture into account. Recent advancements using reinforcement learning have already demonstrated the power of this technique when optimizing the circuit for a sequence of non-parameterized gates. We extend this approach to parameterized gate sets by incorporating general single-qubit unitaries, thus allowing us to consistently reach higher state preparation fidelities at the same number of CNOT gates compared to a hardware-efficient ansatz.

Gate Sequence Optimization for Parameterized Quantum Circuits using Reinforcement Learning

TL;DR

The paper tackles noise from entangling gates on NISQ devices by optimizing entangling gate sequences for parameterized quantum circuits using reinforcement learning. It introduces a Double Deep Q-Network framework that sequentially designs gate sequences and tunes continuous gate parameters with fidelity-based rewards, without requiring a priori optimal baselines. Compared with fixed-layered hardware-efficient ansätze, the RL approach achieves higher state-preparation fidelities with fewer CNOTs across multiple connectivities and device simulations, including IBM backends. This flexible, tunable method offers practical improvements for near-term quantum algorithms and can be extended to incorporate real-device noise data.

Abstract

Current experimental quantum computing devices are limited by noise, mainly originating from entangling gates. If an efficient gate sequence for an operation is unknown, one often employs layered parameterized quantum circuits, especially hardware-efficient ansätze, with fixed entangling layer structures. We demonstrate a reinforcement learning algorithm to improve on these by optimizing the entangling gate sequence in the task of quantum state preparation. This allows us to restrict the required number of CNOT gates while taking the qubit connectivity architecture into account. Recent advancements using reinforcement learning have already demonstrated the power of this technique when optimizing the circuit for a sequence of non-parameterized gates. We extend this approach to parameterized gate sets by incorporating general single-qubit unitaries, thus allowing us to consistently reach higher state preparation fidelities at the same number of CNOT gates compared to a hardware-efficient ansatz.

Paper Structure

This paper contains 10 sections, 12 equations, 10 figures, 2 tables, 1 algorithm.

Figures (10)

  • Figure 1: CNOT connectivity for ibm_manila and ibm_quito. Numbers represent qubits, connections depict CNOTs connections in the hardware. The colour coding is given by gate errors (connections) and decoherence (qubits). Images taken from IBM Quantum JavadiAbhari2024.
  • Figure 2: Makeup of a layered parameterized circuit. Single-qubit gates $U$ are parameterized with up to three independent parameters each. (a) depicts the general structure, which consists of layers $U_{\text{layer}}$ repeated $L$ times. As layers begin with entangling gates, a local rotation layer is added at the beginning. In (b) (linear layer) and (c) (pairwise layer) one can see two popular variants for $U_{\text{layer}}$, which will be used for the performance comparison of our algorithm.
  • Figure 3: Individual action which we use for our results. Local gates are only added on target and control qubits of an entangling gate. Other single-qubit operations or entangling gates may be chosen as elementary building blocks, such as the Toffoli gate or the controlled-Z gate.
  • Figure 4: Schematic depicting the process of updating the NN weights from data stored in the replay buffer. The procedure is not performed for a single set of $(s_k, a_k, r_k, s_{k+1})$ but rather for a batch $B$. The target NN is obtained from the value NN using polyak averaging, i.e. after an update step the new target NN weights are calculated as a weighted average of its (99%) and the value NN's (1%) weights.
  • Figure 5: Performance on fully entangled Haar-random states over the course of training for a four-qubit system with unrestricted CNOT connectivity, i.e. all possible CNOT gates are available. The agent slowly improves fidelity by adding more CNOT gates, should they become necessary.
  • ...and 5 more figures