Practical and efficient quantum circuit synthesis and transpiling with Reinforcement Learning

David Kremer; Victor Villar; Hanhee Paik; Ivan Duran; Ismael Faro; Juan Cruz-Benito

Practical and efficient quantum circuit synthesis and transpiling with Reinforcement Learning

David Kremer, Victor Villar, Hanhee Paik, Ivan Duran, Ismael Faro, Juan Cruz-Benito

TL;DR

This work introduces a reinforcement learning framework to enhance quantum circuit synthesis and routing within transpilation workflows. By formulating synthesis as a sequential decision process and routing as a learned heuristic problem, the approach achieves near-optimal results for Linear Function, Clifford, and Permutation circuits up to $9$, $11$, and $65$ qubits, while significantly reducing two-qubit gate depth and count during routing for circuits up to $133$ qubits. The training employs curriculum learning with PPO, and inference supports greedy, sampling, and top-k/top-p strategies, enabling practical deployment without labeled datasets. Across benchmarks, RL outperforms traditional optimization (SAT) in runtime and matches or exceeds heuristic baselines (SABRE/TokenSwapper) in circuit quality, indicating strong potential for integration into AI-powered transpiler services such as Qiskit Transpiler. The results establish a foundation for scalable, device-aware AI-assisted quantum compilation and point to future directions like generic topology models and dynamic circuit synthesis.

Abstract

This paper demonstrates the integration of Reinforcement Learning (RL) into quantum transpiling workflows, significantly enhancing the synthesis and routing of quantum circuits. By employing RL, we achieve near-optimal synthesis of Linear Function, Clifford, and Permutation circuits, up to 9, 11 and 65 qubits respectively, while being compatible with native device instruction sets and connectivity constraints, and orders of magnitude faster than optimization methods such as SAT solvers. We also achieve significant reductions in two-qubit gate depth and count for circuit routing up to 133 qubits with respect to other routing heuristics such as SABRE. We find the method to be efficient enough to be useful in practice in typical quantum transpiling pipelines. Our results set the stage for further AI-powered enhancements of quantum computing workflows.

Practical and efficient quantum circuit synthesis and transpiling with Reinforcement Learning

TL;DR

, and

qubits, while significantly reducing two-qubit gate depth and count during routing for circuits up to

qubits. The training employs curriculum learning with PPO, and inference supports greedy, sampling, and top-k/top-p strategies, enabling practical deployment without labeled datasets. Across benchmarks, RL outperforms traditional optimization (SAT) in runtime and matches or exceeds heuristic baselines (SABRE/TokenSwapper) in circuit quality, indicating strong potential for integration into AI-powered transpiler services such as Qiskit Transpiler. The results establish a foundation for scalable, device-aware AI-assisted quantum compilation and point to future directions like generic topology models and dynamic circuit synthesis.

Abstract

Paper Structure (13 sections, 11 figures, 3 tables)

This paper contains 13 sections, 11 figures, 3 tables.

Introduction
Results
Circuit Synthesis with Reinforcement Learning
Training
Inference
Clifford Circuit Synthesis with Reinforcement Learning
Circuit Synthesis Benchmarks
Circuit Routing with Reinforcement Learning
Fixed size RL routing
Generic RL routing
Discussion
Synthesis benchmarks
Reference of coupling maps used in the paper

Figures (11)

Figure 1: Diagram describing the RL-based circuit synthesis process.
Figure 2: Training progress for Clifford synthesis on 7 qubits with "H" connectivity. The horizontal axis shows the progress of the training in terms of total number of steps taken (number of Cliffords "seen" by the model). The vertical axes on the different graphs represent how different quantities evolve through the training.
Figure 3: Number of SWAP layers obtained from synthesizing random permutations for 4 different topologies on 8, 12, 27 and 65 qubits, with an heuristic algorithm (TokenSwapper with 100 trials), a SAT solver, and our RL algorithm with 1 and 100 runs.
Figure 4: Number of CNOT layers obtained from synthesizing 100 random Cliffords for the 6-Y topology, with an heuristic algorithm (Greedy with further mapping using SABRE), a SAT solver, and our RL algorithm with 1, 10 and 100 runs, against the time taken by each algorithm for each of the Cliffords.
Figure 5: CNOT count (left) and depth (right) for 8-10 qubit quantum volume circuits routed to linear connectivity with different algorithms, and routed to a 12-qubit ring with RL routing.
...and 6 more figures

Practical and efficient quantum circuit synthesis and transpiling with Reinforcement Learning

TL;DR

Abstract

Practical and efficient quantum circuit synthesis and transpiling with Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)