Table of Contents
Fetching ...

QFlowNet: Fast, Diverse, and Efficient Unitary Synthesis with Generative Flow Networks

Inhoe Koo, Hyunho Cha, Jungwoo Lee

TL;DR

This work proposes QFlowNet, a novel framework that learns efficiently from sparse signals by pairing a Generative Flow Network (GFlowNet) with Transformers, overcoming the single-solution limitation of RL while offering faster inference than other generative models like diffusion.

Abstract

Unitary Synthesis, the decomposition of a unitary matrix into a sequence of quantum gates, is a fundamental challenge in quantum compilation. Prevailing reinforcement learning (RL) approaches are often hampered by sparse reward signals, which necessitate complex reward shaping or long training times, and typically converge to a single policy, lacking solution diversity. In this work, we propose QFlowNet, a novel framework that learns efficiently from sparse signals by pairing a Generative Flow Network (GFlowNet) with Transformers. Our approach addresses two key challenges. First, the GFlowNet framework is fundamentally designed to learn a diverse policy that samples solutions proportional to their reward, overcoming the single-solution limitation of RL while offering faster inference than other generative models like diffusion. Second, the Transformers act as a powerful encoder, capturing the non-local structure of unitary matrices and compressing a high-dimensional state into a dense latent representation for the policy network. Our agent achieves an overall success rate of 99.7% on a 3-qubit benchmark(lengths 1-12) and discovers a diverse set of compact circuits, establishing QFlowNet as an efficient and diverse paradigm for unitary synthesis.

QFlowNet: Fast, Diverse, and Efficient Unitary Synthesis with Generative Flow Networks

TL;DR

This work proposes QFlowNet, a novel framework that learns efficiently from sparse signals by pairing a Generative Flow Network (GFlowNet) with Transformers, overcoming the single-solution limitation of RL while offering faster inference than other generative models like diffusion.

Abstract

Unitary Synthesis, the decomposition of a unitary matrix into a sequence of quantum gates, is a fundamental challenge in quantum compilation. Prevailing reinforcement learning (RL) approaches are often hampered by sparse reward signals, which necessitate complex reward shaping or long training times, and typically converge to a single policy, lacking solution diversity. In this work, we propose QFlowNet, a novel framework that learns efficiently from sparse signals by pairing a Generative Flow Network (GFlowNet) with Transformers. Our approach addresses two key challenges. First, the GFlowNet framework is fundamentally designed to learn a diverse policy that samples solutions proportional to their reward, overcoming the single-solution limitation of RL while offering faster inference than other generative models like diffusion. Second, the Transformers act as a powerful encoder, capturing the non-local structure of unitary matrices and compressing a high-dimensional state into a dense latent representation for the policy network. Our agent achieves an overall success rate of 99.7% on a 3-qubit benchmark(lengths 1-12) and discovers a diverse set of compact circuits, establishing QFlowNet as an efficient and diverse paradigm for unitary synthesis.
Paper Structure (15 sections, 2 equations, 6 figures, 1 algorithm)

This paper contains 15 sections, 2 equations, 6 figures, 1 algorithm.

Figures (6)

  • Figure 1: Schematic of the QFlowNet framework with Transformer policy. At each step $t$, the agent observes the unitary residual $s_t =UV_t^\dagger$. This state is processed by the policy network (a Transformer-based unitary model, shown as the block architecture), which outputs a probability distribution $P_\mathrm{F}(s_{t+1} \mid s_t)$ over all possible next actions (gates). An action $a_t$ is sampled to determine the next state $s_{t+1}$, updating the synthesized circuit ($V_{t+1} = V_t a_t$). The process repeats until a final state is reached. A terminal reward is then calculated and used to update the policy network $P_\mathrm{F}$, completing the learning loop.
  • Figure 2: Conceptual comparison of synthesis problem formulations.(a) A standard formulation where the agent starts from a fixed empty circuit ($V_0$) and attempts to reach a target-dependent goal state ($U$). This approach requires the reward function itself (e.g., $\mathrm{tr}(U'V_\mathrm{f}^\dagger)$) to be redefined for every new target $U'$, precluding policy reuse. (b) Our proposed QFlowNet formulation. The problem is reframed by defining the state as the "unitary residual" ($s_t = U V_t^\dagger$). The agent now starts from a target-dependent start state ($s_0 = U$) and navigates to a fixed, universal goal state ($s_\mathrm{f} = I$). This design makes the reward function (fidelity to $I$) universal, allowing a single, general policy to be trained and applied to any target unitary matrix.
  • Figure 3: Synthesis accuracy as a function of circuit complexity.(a) The plot shows a comparison of the success rates on 3-qubit unitaries. (b) The plot compares the multi-qubit success rates of QFlowNet, highlighting the scalability challenge.
  • Figure 4: Efficiency comparison of QFlowNet against other ML methods.(a) Inference efficiency comparison. Our QFlowNet agent (orange) consistently requires only a small number of attempts, while the genQC diffusion model b5 (blue) requires an exponentially increasing number of samples. (b) Training time comparison. Total training time in days for our QFlowNet versus the Gumbel AlphaZero-based model b25. Our agent trains in 1--2 days, while the Gumbel AlphaZero approach requires 6.5--10 days.
  • Figure 5: Analysis of synthesized circuit compactness and diversity.(a) The normalized confusion matrix displays the distribution of synthesized circuit lengths (x-axis) versus the Qiskit-optimized target lengths (y-axis). The most notable results are the strong diagonal (length-optimal) and the cells below it (more compact than the baseline). (b) The histogram shows the number of distinct, correct circuits discovered per target unitary (based on 1024 samples).
  • ...and 1 more figures