Table of Contents
Fetching ...

Reinforced Disentanglers on Random Unitary Circuits

Ning Bao, Keiichiro Furuya, Gun Suer

TL;DR

The results indicate that the number of measurements required to disentangle a random quantum circuit is drastically less than the numerical results of measurement-induced phase transition papers.

Abstract

We search for efficient disentanglers on random Clifford circuits of two-qubit gates arranged in a brick-wall pattern, using the proximal policy optimization (PPO) algorithm \cite{schulman2017proximalpolicyoptimizationalgorithms}. Disentanglers are defined as a set of projective measurements inserted between consecutive entangling layers. An efficient disentangler is a set of projective measurements that minimize the averaged von Neumann entropy of the final state with the least number of total projections possible. The problem is naturally amenable to reinforcement learning techniques by taking the binary matrix representing the projective measurements along the circuit as our state, and actions as bit flipping operations on this binary matrix that add or delete measurements at specified locations. We give rewards to our agent dependent on the averaged von Neumann entropy of the final state and the configuration of measurements, such that the agent learns the optimal policy that will take him from the initial state of no measurements to the optimal measurement state that minimizes the entanglement entropy. Our results indicate that the number of measurements required to disentangle a random quantum circuit is drastically less than the numerical results of measurement-induced phase transition papers. Additionally, the reinforcement learning procedure enables us to characterize the pattern of optimal disentanglers, which is not possible in the works of measurement-induced phase transitions.

Reinforced Disentanglers on Random Unitary Circuits

TL;DR

The results indicate that the number of measurements required to disentangle a random quantum circuit is drastically less than the numerical results of measurement-induced phase transition papers.

Abstract

We search for efficient disentanglers on random Clifford circuits of two-qubit gates arranged in a brick-wall pattern, using the proximal policy optimization (PPO) algorithm \cite{schulman2017proximalpolicyoptimizationalgorithms}. Disentanglers are defined as a set of projective measurements inserted between consecutive entangling layers. An efficient disentangler is a set of projective measurements that minimize the averaged von Neumann entropy of the final state with the least number of total projections possible. The problem is naturally amenable to reinforcement learning techniques by taking the binary matrix representing the projective measurements along the circuit as our state, and actions as bit flipping operations on this binary matrix that add or delete measurements at specified locations. We give rewards to our agent dependent on the averaged von Neumann entropy of the final state and the configuration of measurements, such that the agent learns the optimal policy that will take him from the initial state of no measurements to the optimal measurement state that minimizes the entanglement entropy. Our results indicate that the number of measurements required to disentangle a random quantum circuit is drastically less than the numerical results of measurement-induced phase transition papers. Additionally, the reinforcement learning procedure enables us to characterize the pattern of optimal disentanglers, which is not possible in the works of measurement-induced phase transitions.

Paper Structure

This paper contains 11 sections, 21 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: (a) Depiction of the random quantum circuits with the brick-wall structure. Random two-qubit Clifford gates are given in blue, and the projections are given in red (b) The binary matrix that corresponds to the positions of projection operators.
  • Figure 2: Measurement weights $f_{l; \alpha}$ as a function of layers $l$ for increasing values of penalty slope $\alpha$.
  • Figure 3: Rewards and number of projections averaged over episodes vs the number of qubits the trained PPO models, with $\alpha=0.1$, $t_s = 250,000$, $\ell_r = 0.1$, $e_c = 0.01$, and positive sparse reward $p_r = 50.0$. The best fit of the form (Top) $y = \gamma_1 \tanh(\gamma_2 x) + \gamma_3$ and (Bottom) $y = \gamma_1 x + \gamma_2$ are displayed with error bars. (Bottom) Fit parameters can be found in table \ref{['tab:my_label']}.
  • Figure 4: Entanglement growth as a function of depth for brick-wall random Clifford circuits with no projections, with the increasing number of qubits.
  • Figure 5: Layers averaged over measurements, and the total number of measurements averaged over 1000 episodes as a function of the penalty slope $\alpha$, for circuits of size $N \times D/2 = 6\times 6$.
  • ...and 2 more figures