Table of Contents
Fetching ...

Hybrid action Reinforcement Learning for quantum architecture search

Jiayang Niu, Yan Wang, Jie Li, Ke Deng, Azadeh Alavi, Mark Sanderson, Yongli Ren

TL;DR

This work tackles automated quantum architecture search for variational quantum circuits in the VQE setting by introducing HyRLQAS, a hybrid-action reinforcement learning framework that jointly learns discrete gate placement and continuous parameter initialization. The method uses a tensor-based circuit encoding and a Hybrid Policy Network to produce a discrete action, an initial parameter, and a refinement update, with REINFORCE optimization and a curriculum-driven reward based on energy reduction. Key contributions include the unified hybrid action space, a learned warm-start mechanism for external optimizers, and extensive ablations showing gains from both topology and initialization. Experiments on molecular Hamiltonians demonstrate improved ground-state energy accuracy and more compact circuits, indicating a principled path toward automated, hardware-efficient quantum circuit design in the NISQ era.

Abstract

Designing expressive yet trainable quantum circuit architectures remains a major challenge for variational quantum algorithms, as manual or heuristic designs often yield suboptimal performance. We propose HyRLQAS (Hybrid-Action Reinforcement Learning for Quantum Architecture Search), a unified framework that integrates discrete gate placement and continuous parameter generation within a hybrid action space. Unlike existing approaches that optimize circuit structure and parameters separately, HyRLQAS jointly learns both topology and initialization while dynamically refining previously placed gates through reinforcement learning. Trained in a variational quantum eigensolver (VQE) environment, the agent autonomously constructs circuits that minimize molecular ground-state energy. Experimental results demonstrate that HyRLQAS achieves consistently lower energy errors and more compact circuit structures compared with discrete-only and continuous-only baselines. Furthermore, the hybrid action space yields superior parameter initializations, producing post-optimization energy distributions with consistently lower minima. These findings suggest that hybrid-action reinforcement learning offers a principled pathway toward automated and hardware-efficient quantum circuit design.

Hybrid action Reinforcement Learning for quantum architecture search

TL;DR

This work tackles automated quantum architecture search for variational quantum circuits in the VQE setting by introducing HyRLQAS, a hybrid-action reinforcement learning framework that jointly learns discrete gate placement and continuous parameter initialization. The method uses a tensor-based circuit encoding and a Hybrid Policy Network to produce a discrete action, an initial parameter, and a refinement update, with REINFORCE optimization and a curriculum-driven reward based on energy reduction. Key contributions include the unified hybrid action space, a learned warm-start mechanism for external optimizers, and extensive ablations showing gains from both topology and initialization. Experiments on molecular Hamiltonians demonstrate improved ground-state energy accuracy and more compact circuits, indicating a principled path toward automated, hardware-efficient quantum circuit design in the NISQ era.

Abstract

Designing expressive yet trainable quantum circuit architectures remains a major challenge for variational quantum algorithms, as manual or heuristic designs often yield suboptimal performance. We propose HyRLQAS (Hybrid-Action Reinforcement Learning for Quantum Architecture Search), a unified framework that integrates discrete gate placement and continuous parameter generation within a hybrid action space. Unlike existing approaches that optimize circuit structure and parameters separately, HyRLQAS jointly learns both topology and initialization while dynamically refining previously placed gates through reinforcement learning. Trained in a variational quantum eigensolver (VQE) environment, the agent autonomously constructs circuits that minimize molecular ground-state energy. Experimental results demonstrate that HyRLQAS achieves consistently lower energy errors and more compact circuit structures compared with discrete-only and continuous-only baselines. Furthermore, the hybrid action space yields superior parameter initializations, producing post-optimization energy distributions with consistently lower minima. These findings suggest that hybrid-action reinforcement learning offers a principled pathway toward automated and hardware-efficient quantum circuit design.

Paper Structure

This paper contains 39 sections, 31 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Distribution of final energies under three initialization strategies for the same circuit. Near-zero: parameters are initialized close to zero with small Gaussian noise. Random: each parameterized gate is initialized by sampling its parameter uniformly from $[-\pi, \pi]$. Near-random: parameters are first randomly initialized as in the random setting and then perturbed with small Gaussian noise. Detailed configurations are provided in Appendix \ref{['preliminary_experiment']}.
  • Figure 2: Overall framework of the proposed HyRLQAS method. The Tensor-based Circuit Encoding module (top left) encodes the construction information of the previously built circuit. The Hybrid Policy Network (top right) generates the next action, including discrete gate selection, continuous rotation parameters, and refinement updates. The environment (bottom) executes the constructed quantum circuit and returns feedback, while the collected trajectories in the middle are stored and used in batch to update the policy network.
  • Figure 3: Energy distributions of COBYLA-optimized circuits under two initialization strategies using the trained HyRLQAS policy on LiH. Experiment B1 (“warm-up”) uses agent-learned initialization, while B2 (“zero-init”) uses all-zero parameters. Results are shown for the LiH-4 and LiH-6 systems, along with the corresponding KS test statistics.
  • Figure 4: Distribution of final energies under three initialization strategies for the same circuit. Near-zero: parameters are initialized close to zero with small Gaussian noise. Random: each parameterized gate is initialized by sampling its parameter uniformly from $[-\pi, \pi]$. Near-random: parameters are first randomly initialized as in the random setting and then perturbed with small Gaussian noise.
  • Figure 5: Illustration of the Tensor-based Binary Circuit Encoding for a 4-qubit quantum circuit. The purple and orange tensors store the position and type information of CNOT gates and rotation gates, respectively, where the Gate indicate axis spans four dimensions for CNOTs and three dimensions for rotation gates. The blue tensors further encode the parameters of the rotation gates. The Moments index corresponds to the layer index in the quantum circuit, allowing a structured representation of gate placement and parameterization across different circuit layers.