Teaching an Old Dynamics New Tricks: Regularization-free Last-iterate Convergence in Zero-sum Games via BNN Dynamics
Tuo Zhang, Leonardo Stella
TL;DR
This work addresses the challenge of achieving last-iterate convergence in zero-sum multi-agent learning under noisy feedback without relying on regularization. It recasts the dynamics using Brown-von Neumann–Nash (BNN) dynamics and extends them to both normal-form and extensive-form games via counterfactual weighting, implemented through a neural actor–critic framework (BNNAC). The authors provide stochastic approximation analyses showing almost-sure stability within an $O(\sigma)$ neighborhood of Nash equilibria, a transient decay rate of $O(t^{-2/3})$, and an $O(\sigma^2)$ centroid shift due to bias; they demonstrate a scalable neural implementation and empirical advantages over baselines in stationary and nonstationary settings. Together, these results yield regularization-free, last-iterate convergence with robust adaptation to changing environments, enabling safer and more responsive multi-agent learning in large-scale settings.
Abstract
Zero-sum games are a fundamental setting for adversarial training and decision-making in multi-agent learning (MAL). Existing methods often ensure convergence to (approximate) Nash equilibria by introducing a form of regularization. Yet, regularization requires additional hyperparameters, which must be carefully tuned--a challenging task when the payoff structure is known, and considerably harder when the structure is unknown or subject to change. Motivated by this problem, we repurpose a classical model in evolutionary game theory, i.e., the Brown-von Neumann-Nash (BNN) dynamics, by leveraging the intrinsic convergence of this dynamics in zero-sum games without regularization, and provide last-iterate convergence guarantees in noisy normal-form games (NFGs). Importantly, to make this approach more applicable, we develop a novel framework with theoretical guarantees that integrates the BNN dynamics in extensive-form games (EFGs) through counterfactual weighting. Furthermore, we implement an algorithm that instantiates our framework with neural function approximation, enabling scalable learning in both NFGs and EFGs. Empirical results show that our method quickly adapts to nonstationarities, outperforming the state-of-the-art regularization-based approach.
