Separating Ansatz Discovery from Deployment on Larger Problems: Reinforcement Learning for Modular Circuit Design

Gloria Turati; Simone Foderà; Riccardo Nembrini; Maurizio Ferrari Dacrema; Paolo Cremonesi

Separating Ansatz Discovery from Deployment on Larger Problems: Reinforcement Learning for Modular Circuit Design

Gloria Turati, Simone Foderà, Riccardo Nembrini, Maurizio Ferrari Dacrema, Paolo Cremonesi

TL;DR

The RLVQC Block model is trained to discover a modular two-qubit block that can generalize QAOA-style methods and that is often beneficial compared to learning non-modular ansatzes, supporting the feasibility of reusing learned modular structure across problem sizes.

Abstract

As quantum computing continues to gain attention, there is growing interest in how classical machine learning can assist quantum workflows in practice. Automated circuit design, sometimes referred to as Quantum Architecture Search (QAS), is a natural application but relies on the ability to model the quantum system to support learning as the number of qubits grows. This challenge is central to QAS, and much of the current literature that proposes new ways to model the ansatz focuses on small systems, often around ten qubits. In this work, we propose a complementary approach that separates a small-scale structure discovery phase, where a reusable modular circuit block is learned on small instances where classical learning is feasible, from a deployment phase, where the blocks are used to create the ansatz required for larger problems. To this end, we introduce Reinforcement Learning for Variational Quantum Circuits (RLVQC), formulating QAS as a sequential decision-making problem. We evaluate our methodology on Quadratic Unconstrained Binary Optimization (QUBO) instances derived from Maximum Cut, Maximum Clique, and Minimum Vertex Cover. Our RLVQC Block model is trained to discover a modular two-qubit block that can generalize QAOA-style methods and that is often beneficial compared to learning non-modular ansatzes. The blocks discovered on n=8 instances remain effective when deployed on larger instances (n=12 and n=16), supporting the feasibility of reusing learned modular structure across problem sizes. While we do not aim to establish a new state-of-the-art solver or an advantage over classical methods, our results provide evidence that modular ansatz structure can be learned on smaller instances and then extended to larger ones without requiring learning on systems with a large number of qubits, where quantum computing becomes interesting but classical computation becomes impractical.

Separating Ansatz Discovery from Deployment on Larger Problems: Reinforcement Learning for Modular Circuit Design

TL;DR

Abstract

Separating Ansatz Discovery from Deployment on Larger Problems: Reinforcement Learning for Modular Circuit Design

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)