Quantum Logic Gate Synthesis as a Markov Decision Process
M. Sohaib Alam, Noah F. Berthusen, Peter P. Orth
TL;DR
This work shows that key quantum programming tasks—single-qubit state preparation and gate compilation—can be exactly modeled as discrete Markov Decision Processes solved by policy iteration, yielding optimally short gate sequences. By discretizing the Bloch sphere (and Bloch ball for noisy cases) and using finite gate sets, the authors obtain explicit optimal policies and compare them against known shortest circuits, validating the approach. The study also demonstrates noise-aware adaptation, where optimal sequences differ under amplitude-damping and dephasing channels to achieve higher fidelities. The results provide theoretical and practical insight into why reinforcement learning can successfully find compact gate sequences and open avenues for extending the approach to multi-qubit scenarios and error-mitigated quantum programming.
Abstract
Reinforcement learning has witnessed recent applications to a variety of tasks in quantum programming. The underlying assumption is that those tasks could be modeled as Markov Decision Processes (MDPs). Here, we investigate the feasibility of this assumption by exploring its consequences for two fundamental tasks in quantum programming: state preparation and gate compilation. By forming discrete MDPs, focusing exclusively on the single-qubit case (both with and without noise), we solve for the optimal policy exactly through policy iteration. We find optimal paths that correspond to the shortest possible sequence of gates to prepare a state, or compile a gate, up to some target accuracy. As an example, we find sequences of $H$ and $T$ gates with length as small as $11$ producing $\sim 99\%$ fidelity for states of the form $(HT)^{n} |0\rangle$ with values as large as $n=10^{10}$. In the presence of gate noise, we demonstrate how the optimal policy adapts to the effects of noisy gates in order to achieve a higher state fidelity. Our work shows that one can meaningfully impose a discrete, stochastic and Markovian nature to a continuous, deterministic and non-Markovian quantum evolution, and provides theoretical insight into why reinforcement learning may be successfully used to find optimally short gate sequences in quantum programming.
