Table of Contents
Fetching ...

A Reinforcement Learning Environment for Directed Quantum Circuit Synthesis

Michael Kölle, Tom Schubert, Philipp Altmann, Maximilian Zorn, Jonas Stein, Claudia Linnhoff-Popien

TL;DR

This work introduces a comprehensive reinforcement learning environment for quantum circuit synthesis, where circuits are constructed utilizing gates from the the Clifford+T gate set to prepare specific target states and demonstrates their ability to reliably design minimal quantum circuits for a selection of 2-qubit Bell states.

Abstract

With recent advancements in quantum computing technology, optimizing quantum circuits and ensuring reliable quantum state preparation have become increasingly vital. Traditional methods often demand extensive expertise and manual calculations, posing challenges as quantum circuits grow in qubit- and gate-count. Therefore, harnessing machine learning techniques to handle the growing variety of gate-to-qubit combinations is a promising approach. In this work, we introduce a comprehensive reinforcement learning environment for quantum circuit synthesis, where circuits are constructed utilizing gates from the the Clifford+T gate set to prepare specific target states. Our experiments focus on exploring the relationship between the depth of synthesized quantum circuits and the circuit depths used for target initialization, as well as qubit count. We organize the environment configurations into multiple evaluation levels and include a range of well-known quantum states for benchmarking purposes. We also lay baselines for evaluating the environment using Proximal Policy Optimization. By applying the trained agents to benchmark tests, we demonstrated their ability to reliably design minimal quantum circuits for a selection of 2-qubit Bell states.

A Reinforcement Learning Environment for Directed Quantum Circuit Synthesis

TL;DR

This work introduces a comprehensive reinforcement learning environment for quantum circuit synthesis, where circuits are constructed utilizing gates from the the Clifford+T gate set to prepare specific target states and demonstrates their ability to reliably design minimal quantum circuits for a selection of 2-qubit Bell states.

Abstract

With recent advancements in quantum computing technology, optimizing quantum circuits and ensuring reliable quantum state preparation have become increasingly vital. Traditional methods often demand extensive expertise and manual calculations, posing challenges as quantum circuits grow in qubit- and gate-count. Therefore, harnessing machine learning techniques to handle the growing variety of gate-to-qubit combinations is a promising approach. In this work, we introduce a comprehensive reinforcement learning environment for quantum circuit synthesis, where circuits are constructed utilizing gates from the the Clifford+T gate set to prepare specific target states. Our experiments focus on exploring the relationship between the depth of synthesized quantum circuits and the circuit depths used for target initialization, as well as qubit count. We organize the environment configurations into multiple evaluation levels and include a range of well-known quantum states for benchmarking purposes. We also lay baselines for evaluating the environment using Proximal Policy Optimization. By applying the trained agents to benchmark tests, we demonstrated their ability to reliably design minimal quantum circuits for a selection of 2-qubit Bell states.
Paper Structure (22 sections, 13 equations, 6 figures, 4 tables)

This paper contains 22 sections, 13 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Comparison of two reward techniques: step-penalty and distance. Each data point represents the average performance of three runs, trained in a 2-qubit environment with varied target circuit-depth.
  • Figure 2: State diagram displaying the target state generation algorithm, while gate-list and state-list are implemented as actual lists and $\lambda$ is the circuit-depth parameter defining the absolute number of gates, which must be applied to get to the target. The algorithm starts at the upper right side of the figure.
  • Figure 3: Schematic of the sequential application of the updated list of actions on the initial state $|00..0>$ transforming it to the current state outputted by the environment.
  • Figure 4: Agents' performances on targets with different qubit numbers $n$ (2, 3, 4, 5, 10) while the circuit-depth $\lambda$ is varied (1-15). Every data point represents the average performance of 3 agents trained on systems possessing the respective $n$ and $\lambda$ settings.
  • Figure 5: The average number of gates $n_{g}$ applied by PPO-agents trained on different circuit-depths $\lambda$, when applied on 100 random targets from the evaluation levels 'easy', 'medium' and 'hard' respectively.
  • ...and 1 more figures