Table of Contents
Fetching ...

Quantum entanglement provides a competitive advantage in adversarial games

Peiyong Wang, Kieran Hymas, James Quach

TL;DR

A controlled study isolating the role of quantum entanglement in a quantum-classical hybrid agent trained on Pong, a competitive Markov game establishes entanglement as a function resource for representation learning in competitive reinforcement learning.

Abstract

Whether uniquely quantum resources confer advantages in fully classical, competitive environments remains an open question. Competitive zero-sum reinforcement learning is particularly challenging, as success requires modelling dynamic interactions between opposing agents rather than static state-action mappings. Here, we conduct a controlled study isolating the role of quantum entanglement in a quantum-classical hybrid agent trained on Pong, a competitive Markov game. An 8-qubit parameterised quantum circuit serves as a feature extractor within a proximal policy optimisation framework, allowing direct comparison between separable circuits and architectures incorporating fixed (CZ) or trainable (IsingZZ) entangling gates. Entangled circuits consistently outperform separable counterparts with comparable parameter counts and, in low-capacity regimes, match or exceed classical multilayer perceptron baselines. Representation similarity analysis further shows that entangled circuits learn structurally distinct features, consistent with improved modelling of interacting state variables. These findings establish entanglement as a function resource for representation learning in competitive reinforcement learning.

Quantum entanglement provides a competitive advantage in adversarial games

TL;DR

A controlled study isolating the role of quantum entanglement in a quantum-classical hybrid agent trained on Pong, a competitive Markov game establishes entanglement as a function resource for representation learning in competitive reinforcement learning.

Abstract

Whether uniquely quantum resources confer advantages in fully classical, competitive environments remains an open question. Competitive zero-sum reinforcement learning is particularly challenging, as success requires modelling dynamic interactions between opposing agents rather than static state-action mappings. Here, we conduct a controlled study isolating the role of quantum entanglement in a quantum-classical hybrid agent trained on Pong, a competitive Markov game. An 8-qubit parameterised quantum circuit serves as a feature extractor within a proximal policy optimisation framework, allowing direct comparison between separable circuits and architectures incorporating fixed (CZ) or trainable (IsingZZ) entangling gates. Entangled circuits consistently outperform separable counterparts with comparable parameter counts and, in low-capacity regimes, match or exceed classical multilayer perceptron baselines. Representation similarity analysis further shows that entangled circuits learn structurally distinct features, consistent with improved modelling of interacting state variables. These findings establish entanglement as a function resource for representation learning in competitive reinforcement learning.
Paper Structure (15 sections, 6 equations, 5 figures, 2 tables)

This paper contains 15 sections, 6 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The overall architecture of our quantum-classical hybrid agent. The observation of the environment is an $8$-element vector $[p_l, p_r, b_x, b_y, v_{b,x}, v_{b, y}, s_l, s_r]$, where $p_l$ and $p_r$ denote the position of the left and right paddle, respectively; $b_x$ and $b_y$ are the coordinates of the ball, $v_{b,x}$ and $v_{b, y}$ are the velocity of the ball on the $x-$ and $y-$direction, respectively; $s_l$ and $s_r$ are the scores for the left and the right paddle. A backbone network, whether classical or quantum, takes the observation vector as input and produces an 8-dimensional feature vector. A classical actor network and a classical critic network share the feature vector. Based on the features provided by the backbone network, the actor network proposes an action based on the input representations from the PQC feature extractor; the critic network provides a scalar evaluation of the current state. The four different kinds of backbone network structure studied in this paper are shown in (a) to (d). (a): classical multi-layer perception (MLP); (b) $8$-qubit separable parameterised quantum circuit; (c) $8$-qubit parameterised quantum circuit with fixed entanglement gates (controlled-Z gates), and (d) $8$-qubit parameterised quantum circuit with trainable entanglement gates (the IsingZZ gate $\exp{(-i \frac{\theta}{2} Z_i \otimes Z_j)}$). At the end of the parameterised quantum circuits, all qubits are measured with the Pauli X observable.
  • Figure 2: The return of different quantum backbone configurations with respect to the total training steps. The curves for separable, CZ-entangled, and IsingZZ-entangled are averaged over 10 runs with different random initialisations. The averaged return curve and the max-min shade for returns are calculated and plotted from the smoothed raw data using a weighted exponential moving average. We can see that, although entangled quantum backbones could, on average, outperform separable ones, it is unclear, based on the current results, whether trainable entanglement gates (IsingZZ) could achieve better performance than fixed entanglement gates (CZ). Even though, on average, the backbone with IsingZZ gates can outperform that with CZ gates (as in (a)), the best performance of quantum circuits with CZ gates is better than that of backbones with IsingZZ gates.
  • Figure 3: Comparing the (averaged) episodic performance of the three different quantum backbones (separable, CZ-entangled and IsingZZ-entangled) w.r.t. classical MLP with $64$ parameters. (a): The performance of the IsingZZ-entangled backbone, compared with the classical MLP backbone with $64$ parameters. Each layer of the IsingZZ-entangled backbone has $56$ parameters. In this configuration, only the $1$-layer one could outperform the classical baseline, but with fewer parameters ($56$ vs $64$). In this case, increasing the number of layers (and hence the number of parameters) does not guarantee improved performance either. (b): The performance of the CZ-entangled backbone, compared with the classical MLP backbone with $64$ parameters. Among the different layer configurations, those with $2$ and $3$ layers can outperform the classical baseline after averaging results across $10$ random initialisations. Since CZ-entangled quantum circuits have $48$ parameters per layer, the CZ-entangled backbones that outperform the classical baseline have $96$ and $144$ parameters, respectively. However, when the number of layers is increased beyond $3$, the performance of the quantum-classical hybrid agent drops by half on average at the final iteration. (c): The performance of the separable backbone, compared with the classical MLP backbone. We can see that there is little difference in performance across different layer configurations of the separable backbone, and they all have worse average returns than the $64$-parameter MLP backbone.
  • Figure 4: The similarities of representations generated by different backbone configurations, calculated with centred kernel alignment (CKA) kornblith2019similarityneuralnetworkrepresentations. The heatmap is divided into approximately sixteen blocks, representing intragroup (diagonal blocks) and intergroup (off-diagonal blocks) similarities. For intra-group similarities, we see that the classical backbone generally produces similar representations, even though the hidden dimension differs. The CZ-entangled backbones produce representations with the most variability between different initialisations (intragroup block) and different backbone topologies (intergroup). Generally, for classical neural networks, CKA would yield very high similarity scores for representations produced by networks with the same structure, trained on the same dataset, but with different random initialisations, as shown in the bottom-right corner of the heatmap. However, such intra-group similarity is much harder to find among the representations produced by the quantum backbones. Separable and IsingZZ-entangled PQC backbones have an overall slightly higher intra-group similarity score compared to the CZ-entangled ones. However, all three types of PQC-based backbone networks produce representations that share little similarity with those from the classical backbone networks.
  • Figure 5: Comparing the (averaged) episodic performance of the three different quantum backbones (separable, CZ-entangled and IsingZZ-entangled) w.r.t. classical MLP with $128$, $256$, $336$ and $4096$ parameters.