Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis

Sara Giordano; Kornikar Sen; Miguel A. Martin-Delgado

Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis

Sara Giordano, Kornikar Sen, Miguel A. Martin-Delgado

TL;DR

This work tackles the challenge of efficiently synthesizing quantum circuits that prepare a target state from a fixed initial state by employing a tabular Q-learning framework on a discretized SWEET state space. It introduces a circuit-aware, hybrid reward design that combines offline static rewards with online dynamic penalties to drive minimum-depth and low-gate-count circuits, demonstrated on graph-state benchmarks up to seven qubits. The results show depth-optimal graph-state circuits matching theoretical bounds, and high-fidelity approximations when using a universal gate set, all while leveraging sparse, database-like storage to manage the large state-action space. The approach offers a resource-efficient foundation for quantum circuit optimization in the NISQ and fault-tolerant eras and provides a path toward extending to unitary synthesis and deep RL with parameterized gates.

Abstract

A reinforcement learning (RL) framework is introduced for the efficient synthesis of quantum circuits that generate specified target quantum states from a fixed initial state, addressing a central challenge in both the Noisy Intermediate-Scale Quantum (NISQ) era and future fault-tolerant quantum computing. The approach utilizes tabular Q-learning, based on action sequences, within a discretized quantum state space, to effectively manage the exponential growth of the space dimension.The framework introduces a hybrid reward mechanism, combining a static, domain-informed reward that guides the agent toward the target state with customizable dynamic penalties that discourage inefficient circuit structures such as gate congestion and redundant state revisits. This is a circuit-aware reward, in contrast to the current trend of works on this topic, which are primarily fidelity-based. By leveraging sparse matrix representations and state-space discretization, the method enables practical navigation of high-dimensional environments while minimizing computational overhead. Benchmarking on graph-state preparation tasks for up to seven qubits, we demonstrate that the algorithm consistently discovers minimal-depth circuits with optimized gate counts. Moreover, extending the framework to a universal gate set still yields low depth circuits, highlighting the algorithm robustness and adaptability. The results confirm that this RL-driven approach, with our completely circuit-aware method, efficiently explores the complex quantum state space and synthesizes near-optimal quantum circuits, providing a resource-efficient foundation for quantum circuit optimization.

Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis

TL;DR

Abstract

Hybrid Reward-Driven Reinforcement Learning for Efficient Quantum Circuit Synthesis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (7)