Table of Contents
Fetching ...

Multi-agent learning under uncertainty: Recurrence vs. concentration

Kyriakos Lotidis, Panayotis Mertikopoulos, Nicholas Bambos, Jose Blanchet

TL;DR

The paper analyzes how uncertainty affects multi-agent regularized learning in continuous and discrete time. It demonstrates a sharp dichotomy: null-monotone games exhibit persistent drift away from equilibrium with no invariant distribution, while strongly monotone games yield a near-equilibrium concentration and a unique invariant measure whose mass concentrates near the equilibrium; these results are established via continuous-time SDE analysis, Dynkin's formula, and a discrete-time reduction to restricted spaces with regeneration arguments. The work provides explicit hitting-time bounds and concentration estimates, highlighting fundamental limits of regularized learning under persistent randomness and suggesting avenues for deeper invariant-measure characterizations. Collectively, it advances our understanding of long-run behavior and distributional properties of FTRL-type learning in games, with implications for robustness and performance in data-driven, uncertain environments.

Abstract

In this paper, we examine the convergence landscape of multi-agent learning under uncertainty. Specifically, we analyze two stochastic models of regularized learning in continuous games -- one in continuous and one in discrete time with the aim of characterizing the long-run behavior of the induced sequence of play. In stark contrast to deterministic, full-information models of learning (or models with a vanishing learning rate), we show that the resulting dynamics do not converge in general. In lieu of this, we ask instead which actions are played more often in the long run, and by how much. We show that, in strongly monotone games, the dynamics of regularized learning may wander away from equilibrium infinitely often, but they always return to its vicinity in finite time (which we estimate), and their long-run distribution is sharply concentrated around a neighborhood thereof. We quantify the degree of this concentration, and we show that these favorable properties may all break down if the underlying game is not strongly monotone -- underscoring in this way the limits of regularized learning in the presence of persistent randomness and uncertainty.

Multi-agent learning under uncertainty: Recurrence vs. concentration

TL;DR

The paper analyzes how uncertainty affects multi-agent regularized learning in continuous and discrete time. It demonstrates a sharp dichotomy: null-monotone games exhibit persistent drift away from equilibrium with no invariant distribution, while strongly monotone games yield a near-equilibrium concentration and a unique invariant measure whose mass concentrates near the equilibrium; these results are established via continuous-time SDE analysis, Dynkin's formula, and a discrete-time reduction to restricted spaces with regeneration arguments. The work provides explicit hitting-time bounds and concentration estimates, highlighting fundamental limits of regularized learning under persistent randomness and suggesting avenues for deeper invariant-measure characterizations. Collectively, it advances our understanding of long-run behavior and distributional properties of FTRL-type learning in games, with implications for robustness and performance in data-driven, uncertain environments.

Abstract

In this paper, we examine the convergence landscape of multi-agent learning under uncertainty. Specifically, we analyze two stochastic models of regularized learning in continuous games -- one in continuous and one in discrete time with the aim of characterizing the long-run behavior of the induced sequence of play. In stark contrast to deterministic, full-information models of learning (or models with a vanishing learning rate), we show that the resulting dynamics do not converge in general. In lieu of this, we ask instead which actions are played more often in the long run, and by how much. We show that, in strongly monotone games, the dynamics of regularized learning may wander away from equilibrium infinitely often, but they always return to its vicinity in finite time (which we estimate), and their long-run distribution is sharply concentrated around a neighborhood thereof. We quantify the degree of this concentration, and we show that these favorable properties may all break down if the underlying game is not strongly monotone -- underscoring in this way the limits of regularized learning in the presence of persistent randomness and uncertainty.

Paper Structure

This paper contains 46 sections, 25 theorems, 156 equations, 5 figures.

Key Result

Proposition 1

Suppose that eq:GDA-stoch is run on the game eq:bilinear with initial condition $x_{0}\in\mathbb{R}^{2}$. Then:

Figures (5)

  • Figure 1: Trajectories and statistics of play under \ref{['eq:FTRL']} with entropic regularization in two min-max games over $\mathcal{X} = [0,1]^{2}$, a bilinear and a quadratic one (top vs. bottom half respectively). Deterministic orbits are plotted in red and stochastic trajectories in shades of blue, with darker hues indicating later points in time; the density plots depict the resulting visitation frequency in $\mathcal{X}$. In tune with \ref{['thm:null-disc', 'thm:strong-disc']}, we see that learning in null-monotone games drifts toward the extremes of $\mathcal{X}$; by contrast, in strongly monotone games, learning orbits drift toward equilibrium, but continue to fluctuate around it. More details are provided in \ref{['app:numerics']}.
  • Figure 2: Visualization of the long-run occupancy measure for the min-max game with loss-gain function $f(x_{1}, x_{2})$. Each plot shows the empirical density of the final iterates of $10^5$ runs of \ref{['eq:FTRL']} for $10^2$ steps, starting from uniformly random initial conditions. The surface plot encodes density via both height and color. Each row corresponds to a different step-size $\gamma \in \{0.1, 0.5\}$, while the columns vary the noise level $\sigma \in \{0.5, 1\}$.
  • Figure 3: Average final distance from equilibrium for different values of the step-size $\gamma$ and the noise level $\sigma$. Each point represents the mean over $100$ independent runs of length $10{,}000$, with shaded regions indicating one standard deviation.
  • Figure 4: Average hitting time (in iterations) to a neighborhood of the equilibrium $x^{\ast}$ with radius $r \in \{0.005,\, 0.01,\, 0.05,\, 0.1\}$, computed over $100$ runs for each $(\gamma, \sigma)$ pair.
  • Figure 5: Visualization of the long-run occupancy measure for the bilinear game with entropic regularization. Each plot shows the empirical density of the final iterates of $10^5$ runs of \ref{['eq:FTRL']} for $10^2$ steps, starting from uniformly random initial conditions. The surface plot encodes density via both height and color. Each row corresponds to a different step-size $\gamma \in \{0.1, 0.2\}$, while the columns vary the noise level $\sigma \in \{1, 2\}$.

Theorems & Definitions (55)

  • Definition 1
  • Remark 1
  • Remark 2
  • Example 2.1: Euclidean regularization
  • Example 2.2: Entropic regularization
  • Proposition 1
  • Proposition 2
  • Remark 3
  • Remark 4
  • Theorem 1: Null-monotone games
  • ...and 45 more