Optimizing ZX-Diagrams with Deep Reinforcement Learning
Maximilian Nägele, Florian Marquardt
TL;DR
The paper addresses the challenge of optimizing ZX-diagrams by learning sequences of local ZX calculus transformations with reinforcement learning. It introduces a graph neural network policy trained via PPO to operate directly on ZX diagrams, with actions on nodes and edges as well as a global Stop, and a reward based on reducing the diagram size while preserving the represented quantum process up to a scalar. The results show the RL agent outperforms greedy, simulated annealing, and handcrafted ZX diagram optimizers and generalizes to diagrams much larger than those seen in training, indicating strong transferability. The work suggests broad applicability to tasks such as gate count reduction, tensor-network speeding, and circuit equivalence checking, and outlines future directions to incorporate gFlow or Pauli flow preserving rules and circuit-extraction aware rewards.
Abstract
ZX-diagrams are a powerful graphical language for the description of quantum processes with applications in fundamental quantum mechanics, quantum circuit optimization, tensor network simulation, and many more. The utility of ZX-diagrams relies on a set of local transformation rules that can be applied to them without changing the underlying quantum process they describe. These rules can be exploited to optimize the structure of ZX-diagrams for a range of applications. However, finding an optimal sequence of transformation rules is generally an open problem. In this work, we bring together ZX-diagrams with reinforcement learning, a machine learning technique designed to discover an optimal sequence of actions in a decision-making problem and show that a trained reinforcement learning agent can significantly outperform other optimization techniques like a greedy strategy, simulated annealing, and state-of-the-art hand-crafted algorithms. The use of graph neural networks to encode the policy of the agent enables generalization to diagrams much bigger than seen during the training phase.
