Reinforcement Learning to Disentangle Multiqubit Quantum States from Partial Observations
Pavel Tashev, Stefan Petrov, Friederike Metz, Marin Bukov
TL;DR
The paper tackles the challenge of disentangling arbitrary multiqubit states using only partial information by casting the problem as a reinforcement-learning task. It employs a permutation-equivariant transformer policy to select which pair of qubits to couple with a two-qubit gate, and computes locally optimal gates from two-qubit reduced density matrices, enabling state-dependent, short disentangling circuits. Across 4–6 qubits and Haar-random initial states, the RL agent outperforms baseline random/greedy strategies, achieving substantial reductions in gate counts and CNOT complexity after transpilation, with demonstrated resilience to shot and hardware noise. These results suggest practical pathways for state preparation and circuit synthesis on NISQ devices, including a general 4-qubit circuit using at most five 2-qubit gates (ten CNOTs) to disentangle any 4-qubit state, with potential extensions to larger systems and tensor-network-inspired architectures.
Abstract
Using partial knowledge of a quantum state to control multiqubit entanglement is a largely unexplored paradigm in the emerging field of quantum interactive dynamics with the potential to address outstanding challenges in quantum state preparation and compression, quantum control, and quantum complexity. We present a deep reinforcement learning (RL) approach to constructing short disentangling circuits for arbitrary 4-, 5-, and 6-qubit states using an actor-critic algorithm. With access to only two-qubit reduced density matrices, our agent decides which pairs of qubits to apply two-qubit gates on; requiring only local information makes it directly applicable on modern NISQ devices. Utilizing a permutation-equivariant transformer architecture, the agent can autonomously identify qubit permutations within the state, and adjusts the disentangling protocol accordingly. Once trained, it provides circuits from different initial states without further optimization. We demonstrate the agent's ability to identify and exploit the entanglement structure of multiqubit states. For 4-, 5-, and 6-qubit Haar-random states, the agent learns to construct disentangling circuits that exhibit strong correlations both between consecutive gates and among the qubits involved. Through extensive benchmarking, we show the efficacy of the RL approach to find disentangling protocols with minimal gate resources. We explore the resilience of our trained agents to noise, highlighting their potential for real-world quantum computing applications. Analyzing optimal disentangling protocols, we report a general circuit to prepare an arbitrary 4-qubit state using at most 5 two-qubit (10 CNOT) gates.
