Pauli Network Circuit Synthesis with Reinforcement Learning
Ayushi Dubal, David Kremer, Simon Martiel, Victor Villar, Derek Wang, Juan Cruz-Benito
TL;DR
The paper tackles the challenge of efficiently transpiling quantum circuits by introducing a reinforcement-learning–based method to re-synthesize Pauli Networks that include Clifford gates and single-qubit rotations up to six qubits. It adapts a PPO-based RL framework to learn a gate-placing heuristic, uses a structured state representation combining a Clifford tableau and a commutation DAG, and incorporates a collect-and-re-synthesize pipeline to optimize large circuits under hardware coupling constraints. Empirically, the approach yields substantial improvements: on 6-qubit random Pauli Networks it achieves roughly a 2x reduction in two-qubit gate count relative to heuristics, and when embedded in a post-transpilation Benchpress workflow it provides average 20% reductions in two-qubit gate count and depth (with larger gains in many instances). These results demonstrate the practicality and scalability of RL-driven local circuit resynthesis for realistic, large-scale quantum transpilation workloads.
Abstract
We introduce a Reinforcement Learning (RL)-based method for re-synthesis of quantum circuits containing arbitrary Pauli rotations alongside Clifford operations. By collapsing each sub-block to a compact representation and then synthesizing it step-by-step through a learned heuristic, we obtain circuits that are both shorter and compliant with hardware connectivity constraints. We find that the method is fast enough and good enough to work as an optimization procedure: in direct comparisons on 6-qubit random Pauli Networks against state-of-the-art heuristic methods, our RL approach yields over 2x reduction in two-qubit gate count, while executing in under 10 milliseconds per circuit. We further integrate the method into a collect-and-re-synthesize pipeline, applied as a Qiskit transpiler pass, where we observe average improvements of 20% in two-qubit gate count and depth, reaching up to 60% for many instances, across the Benchpress benchmark. These results highlight the potential of RL-driven synthesis to significantly improve circuit quality in realistic, large-scale quantum transpilation workloads.
