Beyond Reinforcement Learning: Fast and Scalable Quantum Circuit Synthesis

Lukas Theißinger; Thore Gerlach; David Berghaus; Christian Bauckhage

Beyond Reinforcement Learning: Fast and Scalable Quantum Circuit Synthesis

Lukas Theißinger, Thore Gerlach, David Berghaus, Christian Bauckhage

TL;DR

Quantum unitary synthesis is hampered by combinatorial search and misaligned objectives for traditional methods. The authors propose an RL-free approach that uses a supervised MDL predictor to estimate remaining gate cost and guide stochastic beam search over Clifford+T circuits, achieving zero-shot generalization across qubit counts. A lightweight MLP predictor trained on synthetic data provides a scalable value function, delivering faster wall-clock synthesis with higher success rates than RL baselines, and performing robustly on standardized QAS-Bench tasks up to $n=5$ qubits. This approach demonstrates that fast, learned heuristics combined with efficient search can meaningfully improve scalable quantum circuit synthesis without per-qubit retraining or environment interaction.

Abstract

Quantum unitary synthesis addresses the problem of translating abstract quantum algorithms into sequences of hardware-executable quantum gates. Solving this task exactly is infeasible in general due to the exponential growth of the underlying combinatorial search space. Existing approaches suffer from misaligned optimization objectives, substantial training costs and limited generalization across different qubit counts. We mitigate these limitations by using supervised learning to approximate the minimum description length of residual unitaries and combining this estimate with stochastic beam search to identify near optimal gate sequences. Our method relies on a lightweight model with zero-shot generalization, substantially reducing training overhead compared to prior baselines. Across multiple benchmarks, we achieve faster wall-clock synthesis times while exceeding state-of-the-art methods in terms of success rate for complex circuits.

Beyond Reinforcement Learning: Fast and Scalable Quantum Circuit Synthesis

TL;DR

qubits. This approach demonstrates that fast, learned heuristics combined with efficient search can meaningfully improve scalable quantum circuit synthesis without per-qubit retraining or environment interaction.

Abstract

Paper Structure (21 sections, 16 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 21 sections, 16 equations, 5 figures, 2 tables, 1 algorithm.

Introduction
Related Work
Background
Methodology
Predicting the Minimum Description Length
Inference with Stochastic Beam Search
Experiments
Evaluation on Synthetic Data
Standardized Evaluation on QAS-Bench
Limitations
Conclusion
A Transformer-based Alternative for the MDL-Predictor
Input Tokenization and Embedding
Axial Attention Encoder
Pooling and Prediction Head
...and 6 more sections

Figures (5)

Figure 1: Fast and Scalable Quantum Circuit Synthesis.Left: Illustration of our synthesis search. We use an MDL predictor with beam search at inference time: candidate circuits are expanded by appending gates (+$\mathbf{\text{T}}$, +$\mathbf{\text{CX}}$). Each node denotes a partial gate sequence ($\mathbf{\text{T}}$, $\mathbf{\text{CX}}$, $\mathbf{\text{T}}\mathbf{\text{CX}}$, $\mathbf{\text{CX}}\mathbf{\text{T}}$, $\mathbf{\text{T}}\mathbf{\text{T}}$, $\mathbf{\text{CX}}\mathbf{\text{CX}}$) and the numbers show the predicted remaining MDL. Green nodes are kept, while red nodes are pruned from the search. Right: Comparison of several distance measures: Hilbert-Schmidt (HS), worst-case, average fidelity and MDL evaluated on representative two-qubit circuits, illustrating how the choice of metric changes the landscape as circuits approach the target.
Figure 2: Success counts (out of 100 targets per $\mathbf{\text{T}}$-count, higher is better) for MDL-guided beam search versus the RL baseline of rietsch2024unitary and annealing algorithm of paradis2024synthetiq on 4 and 5-qubit instances. Our method uses beam width $B{=}10$ and $8000$ trials per instance (avg. $\sim$22s runtime) and declares success when $F_{\mathrm{avg}}(\mathbf{U}(C),\mathbf{U}^\star)\ge 0.9$ (\ref{['eq:AvgFidelity']}). Results for RL at high T-counts are unavailable because they are not reported in rietsch2024unitary, likely due to the associated computational cost.
Figure 3: QAS-Bench lu2023qasbench QC Regeneration results as heatmaps for six methods. Columns correspond to layer difficulty (1--6) and rows to qubit count (2--5). Each cell shows the number of successful syntheses out of 15 targets (5 RC-S + 10 RC-C) for that $(n,\text{layer})$ bucket. All methods are re-run on the same targets under a budget-controlled per-instance wall-clock budget (22 s for ours, 30 s for brute force and Synthetiq, 60 s for all others); darker cells indicate higher success. Our method uses a single $n{=}5$ MDL predictor with padding for $n<5$, beam width $B{=}10$ and the success criterion $F_{\mathrm{avg}}(\mathbf{U}(C),\mathbf{U}^\star)\ge 0.99$.
Figure 4: Canonical structured circuits used in our evaluation: GHZ states, cluster states, phase-gadget constructions and the $[[5,1,3]]$ (perfect) quantum error-correcting code.
Figure 5: Overall success rate as a function of the beam-search trial budget.

Beyond Reinforcement Learning: Fast and Scalable Quantum Circuit Synthesis

TL;DR

Abstract

Beyond Reinforcement Learning: Fast and Scalable Quantum Circuit Synthesis

Authors

TL;DR

Abstract

Table of Contents

Figures (5)