Table of Contents
Fetching ...

Aspiration-based Perturbed Learning Automata in Games with Noisy Utility Measurements. Part B: Stochastic Stability in Weakly Acyclic Games

Georgios C. Chasparis

TL;DR

This work advances reinforcement-based learning for distributed multi-agent optimization by introducing Aspiration-based Perturbed Learning Automata (APLA), which augments action reinforcement with an aspiration-based satisfaction level to cope with noisy payoff observations. The authors develop a finite-state Markov-chain framework and Freidlin–Wentzell graph machinery to characterize stochastic stability, deriving conditions under which learning converges (in a weak sense) to the pure Nash equilibria set and, in certain weakly acyclic games, to payoff-dominant equilibria. The analysis hinges on approximating one-step transition probabilities via resistance concepts and connecting improvement paths to W-graphs, yielding explicit criteria for stochastically stable states. Simulations on the Stag-Hunt game illustrate that APLA can reliably select efficient equilibria in noisy environments, in contrast to PLA which may favor risk-dominant outcomes. Overall, the approach broadens convergence guarantees for reinforcement-based learning beyond potential/coordination games and demonstrates practical equilibrium selection in large, decentralized settings.

Abstract

Reinforcement-based learning dynamics may exhibit several limitations when applied in a distributed setup. In (repeatedly-played) multi-player/action strategic-form games, and when each player applies an independent copy of the learning dynamics, convergence to (usually desirable) pure Nash equilibria cannot be guaranteed. Prior work has only focused on a small class of games, namely potential and coordination games. Furthermore, strong convergence guarantees (i.e., almost sure convergence or weak convergence) are mostly restricted to two-player games. To address this main limitation of reinforcement-based learning in repeatedly-played strategic-form games, this paper introduces a novel payoff-based learning scheme for distributed optimization in multi-player/action strategic-form games. We present an extension of perturbed learning automata (PLA), namely aspiration-based perturbed learning automata (APLA), in which each player's probability distribution for selecting actions is reinforced both by repeated selection and an aspiration factor that captures the player's satisfaction level. We provide a stochastic stability analysis of APLA in multi-player positive-utility games under the presence of noisy observations. This paper is the second part of this study that analyzes stochastic stability in multi-player/action weakly-acyclic games in the presence of noisy observations. We provide conditions under which convergence is attained (in weak sense) to the set of pure Nash equilibria and payoff-dominant equilibria. To the best of our knowledge, this is the first reinforcement-based learning scheme that addresses convergence in weakly-acyclic games. Lastly, we provide a specialization of the results to the classical Stag-Hunt game, supported by a simulation study.

Aspiration-based Perturbed Learning Automata in Games with Noisy Utility Measurements. Part B: Stochastic Stability in Weakly Acyclic Games

TL;DR

This work advances reinforcement-based learning for distributed multi-agent optimization by introducing Aspiration-based Perturbed Learning Automata (APLA), which augments action reinforcement with an aspiration-based satisfaction level to cope with noisy payoff observations. The authors develop a finite-state Markov-chain framework and Freidlin–Wentzell graph machinery to characterize stochastic stability, deriving conditions under which learning converges (in a weak sense) to the pure Nash equilibria set and, in certain weakly acyclic games, to payoff-dominant equilibria. The analysis hinges on approximating one-step transition probabilities via resistance concepts and connecting improvement paths to W-graphs, yielding explicit criteria for stochastically stable states. Simulations on the Stag-Hunt game illustrate that APLA can reliably select efficient equilibria in noisy environments, in contrast to PLA which may favor risk-dominant outcomes. Overall, the approach broadens convergence guarantees for reinforcement-based learning beyond potential/coordination games and demonstrates practical equilibrium selection in large, decentralized settings.

Abstract

Reinforcement-based learning dynamics may exhibit several limitations when applied in a distributed setup. In (repeatedly-played) multi-player/action strategic-form games, and when each player applies an independent copy of the learning dynamics, convergence to (usually desirable) pure Nash equilibria cannot be guaranteed. Prior work has only focused on a small class of games, namely potential and coordination games. Furthermore, strong convergence guarantees (i.e., almost sure convergence or weak convergence) are mostly restricted to two-player games. To address this main limitation of reinforcement-based learning in repeatedly-played strategic-form games, this paper introduces a novel payoff-based learning scheme for distributed optimization in multi-player/action strategic-form games. We present an extension of perturbed learning automata (PLA), namely aspiration-based perturbed learning automata (APLA), in which each player's probability distribution for selecting actions is reinforced both by repeated selection and an aspiration factor that captures the player's satisfaction level. We provide a stochastic stability analysis of APLA in multi-player positive-utility games under the presence of noisy observations. This paper is the second part of this study that analyzes stochastic stability in multi-player/action weakly-acyclic games in the presence of noisy observations. We provide conditions under which convergence is attained (in weak sense) to the set of pure Nash equilibria and payoff-dominant equilibria. To the best of our knowledge, this is the first reinforcement-based learning scheme that addresses convergence in weakly-acyclic games. Lastly, we provide a specialization of the results to the classical Stag-Hunt game, supported by a simulation study.

Paper Structure

This paper contains 23 sections, 14 theorems, 163 equations, 6 figures, 1 table, 1 algorithm.

Key Result

Theorem 5.1

Let us consider sufficiently small $\epsilon>0$, $h>0$ and $\overline{\upsilon}>0$ such that $0<\epsilon\tilde{u}_{i}(\alpha)<1$ and $0<h<\tilde{u}_{i}(\alpha)$ almost surely"Almost surely" (a.s.) excludes paths of probability zero of the unperturbed process ${\mathbb{P}}_{z'}[\cdot]$ (due to the no

Figures (6)

  • Figure 1: Graphical sketch of theoretical contributions of both Part A and Part B of this study. It demonstrates the approximations performed on the transition probability function and the invariant probability measure of the induced Markov chain of the dynamics. (1) Theorem \ref{['Th:StochasticStability:PureStrategyStates']}, (2) Lemma \ref{['Lm:StationaryDistributionApproximation']}, (3) Theorem \ref{['Th:StochasticallyStableStatesMinimumResistance']}, (4) Corollary \ref{['Cor:StationaryDistributionInWeaklyAcyclicGames']}, (5) Corollary \ref{['Cor:PayoffDominance']}.
  • Figure 2: Examples of $s$-graphs in case ${\mathcal{S}}$ contains four states.
  • Figure 3: $\mathcal{W}$-graphs in the case (a) $\{s_1\}$-graph, where $s_1\in{\mathcal{S}}\backslash{\mathcal{S}}_{\rm NE}$ corresponds to a non-Nash equilibrium. (b) $\{s_1^*\}$-graph, where $s_1^*\in{\mathcal{S}}_{\rm NE}$ corresponds to a pure Nash equilibrium. Solid lines correspond to better replies and dashed lines otherwise.
  • Figure 4: Mean frequency of occurrence of the (A,A) action profile of Table \ref{['Tb:CoordinationGame']} over the whole simulation time and over 10 simulation runs. The following configuration parameters have been used: $\epsilon=\nu=0.06$, $h=\lambda=0.04$, $c=30$.
  • Figure 5: Mean frequency of occurrence of the (A,A) action profile of Table \ref{['Tb:CoordinationGame']} at the end of the simulation time and over 10 simulation runs. The following configuration parameters have been used: $\epsilon=\nu=0.06$, $h=\lambda=0.04$, $c=30$.
  • ...and 1 more figures

Theorems & Definitions (21)

  • Definition 4.1: Pure Nash Equilibrium
  • Definition 4.2: Better Reply
  • Definition 4.3: Improvement Path
  • Definition 4.4: Weakly acyclic game
  • Definition 5.1: Pure Strategy State
  • Theorem 5.1: Stochastic Stability
  • Definition 6.1
  • Lemma 6.1: Lemma 6.3.1 in FreidlinWentzell84
  • Lemma 6.2: ${\mathcal{S}}_{\rm NE}$-graphs in weakly acyclic games
  • Lemma 6.3: One-step transition probability approximation
  • ...and 11 more