Table of Contents
Fetching ...

Reinforcement learning with learned gadgets to tackle hard quantum problems on real hardware

Akash Kundu, Leopoldo Sarra

TL;DR

This work introduces gadget reinforcement learning (GRL), a framework that merges reinforcement learning with program synthesis to autonomously construct and incorporate composite circuit components, or gadgets, into the action space for variational quantum algorithms. By learning from simple TFIM instances and transferring those gadgets to harder regimes, GRL achieves faster convergence, higher accuracy, and hardware-compatible, more compact PQCs, including circuits that perform well on IBM hardware with reduced transpilation overhead. The approach demonstrates scalable gains, effective transfer across problem sizes, and resilience to noise when using gadget redundancy, indicating practical potential for hardware-aware quantum circuit design. Overall, GRL bridges algorithmic design and real hardware constraints, enabling more efficient exploration of PQCs under realistic budgets and paving the way for broader adoption of automated quantum circuit optimization.

Abstract

Designing quantum circuits for specific tasks is challenging due to the exponential growth of the state space. We introduce gadget reinforcement learning (GRL), which integrates reinforcement learning with program synthesis to automatically generate and incorporate composite gates (gadgets) into the action space. This enhances the exploration of parameterized quantum circuits (PQCs) for complex tasks like approximating ground states of quantum Hamiltonians, an NP-hard problem. We evaluate GRL using the transverse field Ising model under typical computational budgets (e.g., 2- 3 days of GPU runtime). Our results show improved accuracy, hardware compatibility and scalability. GRL exhibits robust performance as the size and complexity of the problem increases, even with constrained computational resources. By integrating gadget extraction, GRL facilitates the discovery of reusable circuit components tailored for specific hardware, bridging the gap between algorithmic design and practical implementation. This makes GRL a versatile framework for optimizing quantum circuits with applications in hardware-specific optimizations and variational quantum algorithms. The code is available at: https://github.com/Aqasch/Gadget_RL

Reinforcement learning with learned gadgets to tackle hard quantum problems on real hardware

TL;DR

This work introduces gadget reinforcement learning (GRL), a framework that merges reinforcement learning with program synthesis to autonomously construct and incorporate composite circuit components, or gadgets, into the action space for variational quantum algorithms. By learning from simple TFIM instances and transferring those gadgets to harder regimes, GRL achieves faster convergence, higher accuracy, and hardware-compatible, more compact PQCs, including circuits that perform well on IBM hardware with reduced transpilation overhead. The approach demonstrates scalable gains, effective transfer across problem sizes, and resilience to noise when using gadget redundancy, indicating practical potential for hardware-aware quantum circuit design. Overall, GRL bridges algorithmic design and real hardware constraints, enabling more efficient exploration of PQCs under realistic budgets and paving the way for broader adoption of automated quantum circuit optimization.

Abstract

Designing quantum circuits for specific tasks is challenging due to the exponential growth of the state space. We introduce gadget reinforcement learning (GRL), which integrates reinforcement learning with program synthesis to automatically generate and incorporate composite gates (gadgets) into the action space. This enhances the exploration of parameterized quantum circuits (PQCs) for complex tasks like approximating ground states of quantum Hamiltonians, an NP-hard problem. We evaluate GRL using the transverse field Ising model under typical computational budgets (e.g., 2- 3 days of GPU runtime). Our results show improved accuracy, hardware compatibility and scalability. GRL exhibits robust performance as the size and complexity of the problem increases, even with constrained computational resources. By integrating gadget extraction, GRL facilitates the discovery of reusable circuit components tailored for specific hardware, bridging the gap between algorithmic design and practical implementation. This makes GRL a versatile framework for optimizing quantum circuits with applications in hardware-specific optimizations and variational quantum algorithms. The code is available at: https://github.com/Aqasch/Gadget_RL

Paper Structure

This paper contains 36 sections, 14 equations, 13 figures, 7 tables.

Figures (13)

  • Figure 1: (Left) Gadget reinforcement learning (GRL) Framework: An RL agent sequentially builds a quantum circuit for state preparation, using the energy expectation of a Hamiltonian as the cost. Rewards $\pm r$ are assigned based on whether the cost falls below a threshold $\zeta$, guiding policy updates. The top-$k$ circuits are analyzed via program synthesis to extract composite gates (gadgets), which are added to the action space for further training. (Right) GRL vs. RL patel2024curriculum on the transverse field Ising model (TFIM): Considering a 2-qubit TFIM (Eq. \ref{['eq:tfim_model']}) with varying field strength $h$. Standard RL fails to find the ground state for $h=1$, while GRL maintains high accuracy as $h$ increases. For small $h$ (e.g., $10^{-3}$), both methods perform well; as $h$ increases, RL accuracy plateaus ($>10^{-3}$ error), but GRL achieves much lower error, demonstrating better adaptability across varying hardness.
  • Figure 2: Results for the 2-qubit transverse field Ising model (TFIM). We compare reinforcement learning-only (blue) with gadget reinforcement learning (GRL) using one (reddish orange) and two (green) extracted components, as shown in the legend. (a) Compares error scaling with varying transverse field strength under a fixed compute budget (a max of 48-hour GPU run). Solid lines show averages over multiple runs; shaded areas indicate solution ranges (smallest values are most relevant). GRL achieves high accuracy for $h=1$. (b) plots RL reward thresholds during training for $h=1$, showing GRL finds circuits with lower cost. Without gadget extraction, accuracy is limited to $10^{-3}$, while GRL achieves machine precision. A similar illustration for 3-qubit TFIM is shown in Fig. \ref{['fig:3_qubit_results']}.
  • Figure 3: For $N=2$ TFIM, GRL with one and two gadgets improves the cumulative reward growth compared to RL. The RL-agent struggles with consistent cumulative rewards and positive returns. GRL with one gadget improves performance, achieving steady cumulative reward growth and frequent positive returns, but experiences a notable drop around the $1000$th episode, reaching the machine precision error threshold. GRL with two gadgets help escalate this drop and reach machine precision with fewer agent-environment interactions.
  • Figure 4: The gadgets extracted using the simple 2-qubit TFIM, when utilized for bigger TFIM, yield better error in finding the ground state for $4$-, $5$- and even $6$-qubit TFIM. It should be noted that to show the power of gadgets, we restrict ourselves to GRL with just one gadget, and still, GRL outperforms RL-algorithms. A detailed discussion is provided in Appendix. \ref{['appendx:gadget_transfer_evaluation']}.
  • Figure 5: Energy gap between the first excited state and the ground state of the TFIM model as a function of the transverse field strength. The separation $\Delta E$ is negligible till $h=10^{-1}$. Hence, due to energy degeneracy, it is easy to find a good energy approximation for $h\leq10^{-1}$. The problem becomes harder when we choose, $h\geq10^{-1}$ as $\Delta E$ becomes non-negligible.
  • ...and 8 more figures