Table of Contents
Fetching ...

Separating Ansatz Discovery from Deployment on Larger Problems: Reinforcement Learning for Modular Circuit Design

Gloria Turati, Simone Foderà, Riccardo Nembrini, Maurizio Ferrari Dacrema, Paolo Cremonesi

TL;DR

The RLVQC Block model is trained to discover a modular two-qubit block that can generalize QAOA-style methods and that is often beneficial compared to learning non-modular ansatzes, supporting the feasibility of reusing learned modular structure across problem sizes.

Abstract

As quantum computing continues to gain attention, there is growing interest in how classical machine learning can assist quantum workflows in practice. Automated circuit design, sometimes referred to as Quantum Architecture Search (QAS), is a natural application but relies on the ability to model the quantum system to support learning as the number of qubits grows. This challenge is central to QAS, and much of the current literature that proposes new ways to model the ansatz focuses on small systems, often around ten qubits. In this work, we propose a complementary approach that separates a small-scale structure discovery phase, where a reusable modular circuit block is learned on small instances where classical learning is feasible, from a deployment phase, where the blocks are used to create the ansatz required for larger problems. To this end, we introduce Reinforcement Learning for Variational Quantum Circuits (RLVQC), formulating QAS as a sequential decision-making problem. We evaluate our methodology on Quadratic Unconstrained Binary Optimization (QUBO) instances derived from Maximum Cut, Maximum Clique, and Minimum Vertex Cover. Our RLVQC Block model is trained to discover a modular two-qubit block that can generalize QAOA-style methods and that is often beneficial compared to learning non-modular ansatzes. The blocks discovered on n=8 instances remain effective when deployed on larger instances (n=12 and n=16), supporting the feasibility of reusing learned modular structure across problem sizes. While we do not aim to establish a new state-of-the-art solver or an advantage over classical methods, our results provide evidence that modular ansatz structure can be learned on smaller instances and then extended to larger ones without requiring learning on systems with a large number of qubits, where quantum computing becomes interesting but classical computation becomes impractical.

Separating Ansatz Discovery from Deployment on Larger Problems: Reinforcement Learning for Modular Circuit Design

TL;DR

The RLVQC Block model is trained to discover a modular two-qubit block that can generalize QAOA-style methods and that is often beneficial compared to learning non-modular ansatzes, supporting the feasibility of reusing learned modular structure across problem sizes.

Abstract

As quantum computing continues to gain attention, there is growing interest in how classical machine learning can assist quantum workflows in practice. Automated circuit design, sometimes referred to as Quantum Architecture Search (QAS), is a natural application but relies on the ability to model the quantum system to support learning as the number of qubits grows. This challenge is central to QAS, and much of the current literature that proposes new ways to model the ansatz focuses on small systems, often around ten qubits. In this work, we propose a complementary approach that separates a small-scale structure discovery phase, where a reusable modular circuit block is learned on small instances where classical learning is feasible, from a deployment phase, where the blocks are used to create the ansatz required for larger problems. To this end, we introduce Reinforcement Learning for Variational Quantum Circuits (RLVQC), formulating QAS as a sequential decision-making problem. We evaluate our methodology on Quadratic Unconstrained Binary Optimization (QUBO) instances derived from Maximum Cut, Maximum Clique, and Minimum Vertex Cover. Our RLVQC Block model is trained to discover a modular two-qubit block that can generalize QAOA-style methods and that is often beneficial compared to learning non-modular ansatzes. The blocks discovered on n=8 instances remain effective when deployed on larger instances (n=12 and n=16), supporting the feasibility of reusing learned modular structure across problem sizes. While we do not aim to establish a new state-of-the-art solver or an advantage over classical methods, our results provide evidence that modular ansatz structure can be learned on smaller instances and then extended to larger ones without requiring learning on systems with a large number of qubits, where quantum computing becomes interesting but classical computation becomes impractical.

Paper Structure

This paper contains 27 sections, 12 equations, 3 figures.

Figures (3)

  • Figure 1: Interaction between agent and environment in a reinforcement learning framework. At time step $t$ the agent observes the state $s_t$ of the environment, performs an action $a_t$, and receives a reward $r_t$. The environment then transitions to a new state $s_{t+1}$, which the agent observes in the next step. This iterative feedback loop is fundamental to the learning process.
  • Figure 2: State $s_t$ is processed by the agent's neural networks. The value network outputs an estimate $\hat{V}_\pi(s_t)$ of the value function (\ref{['eq:value_function']}), while the policy network outputs a probability distribution $\pi(a|s_t)$ on the actions. Action $a_t$ is sampled from this probability distribution.
  • Figure 3: When the environment receives action $a_t$, the corresponding gate is added to the circuit. Then, its parameters are optimized and the circuit is simulated to obtain the next state $s_{t+1}$, which is sent back to the agent with the corresponding reward $r_t$.