Teaching an Old Dynamics New Tricks: Regularization-free Last-iterate Convergence in Zero-sum Games via BNN Dynamics

Tuo Zhang; Leonardo Stella

Teaching an Old Dynamics New Tricks: Regularization-free Last-iterate Convergence in Zero-sum Games via BNN Dynamics

Tuo Zhang, Leonardo Stella

TL;DR

This work addresses the challenge of achieving last-iterate convergence in zero-sum multi-agent learning under noisy feedback without relying on regularization. It recasts the dynamics using Brown-von Neumann–Nash (BNN) dynamics and extends them to both normal-form and extensive-form games via counterfactual weighting, implemented through a neural actor–critic framework (BNNAC). The authors provide stochastic approximation analyses showing almost-sure stability within an $O(\sigma)$ neighborhood of Nash equilibria, a transient decay rate of $O(t^{-2/3})$, and an $O(\sigma^2)$ centroid shift due to bias; they demonstrate a scalable neural implementation and empirical advantages over baselines in stationary and nonstationary settings. Together, these results yield regularization-free, last-iterate convergence with robust adaptation to changing environments, enabling safer and more responsive multi-agent learning in large-scale settings.

Abstract

Zero-sum games are a fundamental setting for adversarial training and decision-making in multi-agent learning (MAL). Existing methods often ensure convergence to (approximate) Nash equilibria by introducing a form of regularization. Yet, regularization requires additional hyperparameters, which must be carefully tuned--a challenging task when the payoff structure is known, and considerably harder when the structure is unknown or subject to change. Motivated by this problem, we repurpose a classical model in evolutionary game theory, i.e., the Brown-von Neumann-Nash (BNN) dynamics, by leveraging the intrinsic convergence of this dynamics in zero-sum games without regularization, and provide last-iterate convergence guarantees in noisy normal-form games (NFGs). Importantly, to make this approach more applicable, we develop a novel framework with theoretical guarantees that integrates the BNN dynamics in extensive-form games (EFGs) through counterfactual weighting. Furthermore, we implement an algorithm that instantiates our framework with neural function approximation, enabling scalable learning in both NFGs and EFGs. Empirical results show that our method quickly adapts to nonstationarities, outperforming the state-of-the-art regularization-based approach.

Teaching an Old Dynamics New Tricks: Regularization-free Last-iterate Convergence in Zero-sum Games via BNN Dynamics

TL;DR

neighborhood of Nash equilibria, a transient decay rate of

, and an

centroid shift due to bias; they demonstrate a scalable neural implementation and empirical advantages over baselines in stationary and nonstationary settings. Together, these results yield regularization-free, last-iterate convergence with robust adaptation to changing environments, enabling safer and more responsive multi-agent learning in large-scale settings.

Abstract

Paper Structure (19 sections, 11 theorems, 41 equations, 4 figures, 1 algorithm)

This paper contains 19 sections, 11 theorems, 41 equations, 4 figures, 1 algorithm.

Intoduction
Related Work
Preliminaries
Normal-form Games
Extensive-form Games
Reach probabilities.
Counterfactual values.
BNN Dynamics
Noisy Feedback
BNN Dynamics under Noisy Feedback
Discrete Update and SA Form
Convergence Results
Bias and Centroid Shift
BNN Dynamics in Extensive-Form Games
Local Regret Structure and Reach Weights
...and 4 more sections

Key Result

Lemma 1

For a player with action set $A_i$, the structural bias term $\beta(\pi)$ is uniformly bounded over the simplex as where $\sigma^2$ is the variance bound in Assumption ass:noisy.

Figures (4)

Figure 1: Comparison between the regularized RD and the BNN dynamics across different settings. NashConv metric in the nonstationary RPS environment with continuous changes in the payoffs (left). Representative trajectories under biased stationary payoffs in the simplex of the biased rock-paper-scissor (RPS) game (right).
Figure 2: BRPS and BRPS-W.
Figure 3: Nonstationary RPS.
Figure 4: Stationary and nonstationary poker games.

Theorems & Definitions (11)

Lemma 1
Lemma 2
Theorem 1
Theorem 2
Theorem 3: Centroid shift
Lemma 3
Lemma 4: Bias and noise bounds in EFG
Lemma 5: One-step expected descent in EFG
Theorem 4: Asymptotic stability in EFG
Theorem 5: Convergence rate in EFG
...and 1 more

Teaching an Old Dynamics New Tricks: Regularization-free Last-iterate Convergence in Zero-sum Games via BNN Dynamics

TL;DR

Abstract

Teaching an Old Dynamics New Tricks: Regularization-free Last-iterate Convergence in Zero-sum Games via BNN Dynamics

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (11)