Table of Contents
Fetching ...

Quantum Logic Gate Synthesis as a Markov Decision Process

M. Sohaib Alam, Noah F. Berthusen, Peter P. Orth

TL;DR

This work shows that key quantum programming tasks—single-qubit state preparation and gate compilation—can be exactly modeled as discrete Markov Decision Processes solved by policy iteration, yielding optimally short gate sequences. By discretizing the Bloch sphere (and Bloch ball for noisy cases) and using finite gate sets, the authors obtain explicit optimal policies and compare them against known shortest circuits, validating the approach. The study also demonstrates noise-aware adaptation, where optimal sequences differ under amplitude-damping and dephasing channels to achieve higher fidelities. The results provide theoretical and practical insight into why reinforcement learning can successfully find compact gate sequences and open avenues for extending the approach to multi-qubit scenarios and error-mitigated quantum programming.

Abstract

Reinforcement learning has witnessed recent applications to a variety of tasks in quantum programming. The underlying assumption is that those tasks could be modeled as Markov Decision Processes (MDPs). Here, we investigate the feasibility of this assumption by exploring its consequences for two fundamental tasks in quantum programming: state preparation and gate compilation. By forming discrete MDPs, focusing exclusively on the single-qubit case (both with and without noise), we solve for the optimal policy exactly through policy iteration. We find optimal paths that correspond to the shortest possible sequence of gates to prepare a state, or compile a gate, up to some target accuracy. As an example, we find sequences of $H$ and $T$ gates with length as small as $11$ producing $\sim 99\%$ fidelity for states of the form $(HT)^{n} |0\rangle$ with values as large as $n=10^{10}$. In the presence of gate noise, we demonstrate how the optimal policy adapts to the effects of noisy gates in order to achieve a higher state fidelity. Our work shows that one can meaningfully impose a discrete, stochastic and Markovian nature to a continuous, deterministic and non-Markovian quantum evolution, and provides theoretical insight into why reinforcement learning may be successfully used to find optimally short gate sequences in quantum programming.

Quantum Logic Gate Synthesis as a Markov Decision Process

TL;DR

This work shows that key quantum programming tasks—single-qubit state preparation and gate compilation—can be exactly modeled as discrete Markov Decision Processes solved by policy iteration, yielding optimally short gate sequences. By discretizing the Bloch sphere (and Bloch ball for noisy cases) and using finite gate sets, the authors obtain explicit optimal policies and compare them against known shortest circuits, validating the approach. The study also demonstrates noise-aware adaptation, where optimal sequences differ under amplitude-damping and dephasing channels to achieve higher fidelities. The results provide theoretical and practical insight into why reinforcement learning can successfully find compact gate sequences and open avenues for extending the approach to multi-qubit scenarios and error-mitigated quantum programming.

Abstract

Reinforcement learning has witnessed recent applications to a variety of tasks in quantum programming. The underlying assumption is that those tasks could be modeled as Markov Decision Processes (MDPs). Here, we investigate the feasibility of this assumption by exploring its consequences for two fundamental tasks in quantum programming: state preparation and gate compilation. By forming discrete MDPs, focusing exclusively on the single-qubit case (both with and without noise), we solve for the optimal policy exactly through policy iteration. We find optimal paths that correspond to the shortest possible sequence of gates to prepare a state, or compile a gate, up to some target accuracy. As an example, we find sequences of and gates with length as small as producing fidelity for states of the form with values as large as . In the presence of gate noise, we demonstrate how the optimal policy adapts to the effects of noisy gates in order to achieve a higher state fidelity. Our work shows that one can meaningfully impose a discrete, stochastic and Markovian nature to a continuous, deterministic and non-Markovian quantum evolution, and provides theoretical insight into why reinforcement learning may be successfully used to find optimally short gate sequences in quantum programming.

Paper Structure

This paper contains 19 sections, 24 equations, 3 figures, 3 tables, 3 algorithms.

Figures (3)

  • Figure 1: Optimal values for various states on the Bloch sphere using the discrete $RZ$ and $RY$ gates, with a discount factor $\gamma = 0.8$. The color of a state corresponds to its optimal value function $V_{\pi^{*}}$, where lighter colors indicate a larger value. Those colored in green are also exactly the states whose optimal circuits to prepare the discrete $\vert 1 \rangle$ state consist of a single $RY$ rotation, while those in blue are also exactly the ones whose optimal circuits consist of an $RZ$ rotation followed by an $RY$ rotation.
  • Figure 2: Optimal value landscape across the Bloch sphere using the set of gates $\{I, H, S,T\}$, with a discount factor $\gamma = 0.95$. The color of a state corresponds to its optimal value function $V_{\pi^{*}}$, where darker colors indicate a larger value. States distributed around the equator of the Bloch sphere are especially advantageous to start from in order to reach the target $\vert 1 \rangle$ state, as their optimal circuits consist of short sequences of $S$ and $H$ gates.
  • Figure 3: Fidelity $\mathcal{F}$ of the state $\sigma$ prepared using optimal gate sequences with target state $\rho_{\text{target}} = (HT)^n\ket{0}$ for fixed $n=10^7$ as a function of noise strength $T_1 = T_2$. The shortest gate sequences (indicated in the figure) are produced by optimal policies $\pi^*_{\text{noisy}}$ (orange) and $\pi^*_{\text{noiseless}}$ (blue) of noisy and noiseless MDPs, respectively. The noisy policy gives gate sequences that are different from the noiseless case, which consistently yield higher fidelities. The optimal noisy gate sequence is $HTHTHTH$ for all times $T_1=T_2 \geq 60 \mu$s. We fix the gate time to $\tau_g =200$ ns when generating the Kraus operators as defined by Eqs. \ref{['eq:gamma_T_1']} and \ref{['eq:p_T_2']}. For each value of $T_1, T_2$, we generate the transition probabilities $p(s'|s,a)$ according to the corresponding noise map and use policy iteration to find the optimal policy. The fidelity is then calculated by applying the gate sequence found by both the noisy and noiseless MDPs to $\ket{0}$ in the error channel for that specific value of $T_1$, $T_2$. The point at infinity represents the noiseless case, and corresponds to the transition probabilities learned by the noiseless MDP.