Beyond Reinforcement Learning: Fast and Scalable Quantum Circuit Synthesis
Lukas Theißinger, Thore Gerlach, David Berghaus, Christian Bauckhage
TL;DR
Quantum unitary synthesis is hampered by combinatorial search and misaligned objectives for traditional methods. The authors propose an RL-free approach that uses a supervised MDL predictor to estimate remaining gate cost and guide stochastic beam search over Clifford+T circuits, achieving zero-shot generalization across qubit counts. A lightweight MLP predictor trained on synthetic data provides a scalable value function, delivering faster wall-clock synthesis with higher success rates than RL baselines, and performing robustly on standardized QAS-Bench tasks up to $n=5$ qubits. This approach demonstrates that fast, learned heuristics combined with efficient search can meaningfully improve scalable quantum circuit synthesis without per-qubit retraining or environment interaction.
Abstract
Quantum unitary synthesis addresses the problem of translating abstract quantum algorithms into sequences of hardware-executable quantum gates. Solving this task exactly is infeasible in general due to the exponential growth of the underlying combinatorial search space. Existing approaches suffer from misaligned optimization objectives, substantial training costs and limited generalization across different qubit counts. We mitigate these limitations by using supervised learning to approximate the minimum description length of residual unitaries and combining this estimate with stochastic beam search to identify near optimal gate sequences. Our method relies on a lightweight model with zero-shot generalization, substantially reducing training overhead compared to prior baselines. Across multiple benchmarks, we achieve faster wall-clock synthesis times while exceeding state-of-the-art methods in terms of success rate for complex circuits.
