No-Regret Learning and Equilibrium Computation in Quantum Games

Wayne Lin; Georgios Piliouras; Ryann Sim; Antonios Varvitsiotis

No-Regret Learning and Equilibrium Computation in Quantum Games

Wayne Lin, Georgios Piliouras, Ryann Sim, Antonios Varvitsiotis

TL;DR

The paper develops a theory of learning in quantum games where agents use no-regret dynamics. It introduces quantum Φ-equilibria (QΦE) and quantum coarse correlated equilibria (QCCE), showing that time-averaged play converges to separable QCCEs in general quantum games and to separable quantum Nash equilibria (QNE) in two-player and polymatrix quantum zero-sum games. It also proves the existence of entangled QCCEs that cannot be learned by current no-regret methods and provides an SDP-based spectrahedral characterization of QCCEs. The results connect online optimization with quantum information, yielding practical convergence guarantees via Matrix Multiplicative Weights Update (MMWU) and illustrating rich equilibrium phenomena including entangled equilibria. This work lays groundwork for distributed quantum systems by clarifying what equilibria are learnable under no-regret dynamics and what requires alternative, possibly correlated mechanisms.

Abstract

As quantum processors advance, the emergence of large-scale decentralized systems involving interacting quantum-enabled agents is on the horizon. Recent research efforts have explored quantum versions of Nash and correlated equilibria as solution concepts of strategic quantum interactions, but these approaches did not directly connect to decentralized adaptive setups where agents possess limited information. This paper delves into the dynamics of quantum-enabled agents within decentralized systems that employ no-regret algorithms to update their behaviors over time. Specifically, we investigate two-player quantum zero-sum games and polymatrix quantum zero-sum games, showing that no-regret algorithms converge to separable quantum Nash equilibria in time-average. In the case of general multi-player quantum games, our work leads to a novel solution concept, that of the {separable} quantum coarse correlated equilibria (QCCE), as the convergent outcome of the time-averaged behavior no-regret algorithms, offering a natural solution concept for decentralized quantum systems. Finally, we show that computing QCCEs can be formulated as a semidefinite program and establish the existence of entangled (i.e., non-separable) QCCEs, which are unlearnable via the current paradigm of no-regret learning.

No-Regret Learning and Equilibrium Computation in Quantum Games

TL;DR

Abstract

Paper Structure (20 sections, 10 theorems, 74 equations, 3 figures, 1 algorithm)

This paper contains 20 sections, 10 theorems, 74 equations, 3 figures, 1 algorithm.

Introduction
Model, approach, and contributions.
Related work.
Quantum Games, Equilibria and Online Optimization
Quantum games
Various notions of quantum equilibria
Spectrahedral characterization of \ref{['QCCE']}s.
$\bf{\Phi}$-equilibria in classical games.
No-$\mathbf{\Phi}$-regret learning in quantum games
No $\mathbf{\Phi}$-regret in classical games.
No-Regret Learning in General Quantum Games
No-Regret Learning in Two-Player Quantum Zero-Sum Games
No-Regret Learning in Polymatrix Quantum Zero-Sum Games
MMWU Experiments
Discussion and Future Work
...and 5 more sections

Key Result

Theorem 3.1

For any quantum game we have the following:

Figures (3)

Figure 1: Maximum individual exploitability of time-averaged strategies of players using MMWU in 20 randomly generated $\mathbb{C}^2\otimes\mathbb{C}^2$ quantum games. The black dotted line denotes the theoretical upper-bound on the exploitability.
Figure 2: Example of oscillatory behaviour of MMWU in two-player quantum zero-sum games. Time is represented using a gradient from green to blue on the Bloch sphere.
Figure 3: Example of MMWU converging to the boundary (i.e., pure states) in two-player quantum zero-sum games. Time is represented using a gradient from green to blue on the Bloch sphere.

Theorems & Definitions (29)

Definition 2.1
Definition 2.2
Definition 2.3
Definition 2.4
Theorem 3.1: Main Theorem
proof : Proof of Theorem \ref{['thm:_sep-QCCE=timeave']}(a)
proof : Proof of Theorem \ref{['thm:_sep-QCCE=timeave']}(b)
Remark 3.1
Theorem 4.1: Quantum Minimax Theorem
proof
...and 19 more

No-Regret Learning and Equilibrium Computation in Quantum Games

TL;DR

Abstract

No-Regret Learning and Equilibrium Computation in Quantum Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (29)