Table of Contents
Fetching ...

Aspiration-based Perturbed Learning Automata in Games with Noisy Utility Measurements. Part A: Stochastic Stability in Non-zero-Sum Games

Georgios C. Chasparis

TL;DR

This work addresses distributed multi-agent optimization with noisy payoff measurements in general non-zero-sum games by proposing Aspiration-based Perturbed Learning Automata (APLA), which augments standard perturbed learning with an evolving aspiration level and an aspiration-driven reinforcement term. The main contribution is a stochastic-stability analysis showing that the infinite-dimensional Markov process induced by APLA converges, in the small-noise limit, to a finite-state Markov chain on pure-strategy states, enabling tractable characterization of long-run behavior even under uniformly bounded observation noise. The results extend prior PLA analyses beyond coordination or potential games to general positive-utility non-zero-sum games and provide robustness to noise, with a simulation illustrating a bias toward payoff-dominant equilibria in a Stag-Hunt example. This framework paves the way for Part B’s specialization to weakly-acyclic games, offering practical convergence guarantees for large distributed systems where synchronized exploration is undesirable.

Abstract

Reinforcement-based learning has attracted considerable attention both in modeling human behavior as well as in engineering, for designing measurement- or payoff-based optimization schemes. Such learning schemes exhibit several advantages, especially in relation to filtering out noisy observations. However, they may exhibit several limitations when applied in a distributed setup. In multi-player weakly-acyclic games, and when each player applies an independent copy of the learning dynamics, convergence to (usually desirable) pure Nash equilibria cannot be guaranteed. Prior work has only focused on a small class of games, namely potential and coordination games. To address this main limitation, this paper introduces a novel payoff-based learning scheme for distributed optimization, namely aspiration-based perturbed learning automata (APLA). In this class of dynamics, and contrary to standard reinforcement-based learning schemes, each player's probability distribution for selecting actions is reinforced both by repeated selection and an aspiration factor that captures the player's satisfaction level. We provide a stochastic stability analysis of APLA in multi-player positive-utility games under the presence of noisy observations. This is the first part of the paper that characterizes stochastic stability in generic non-zero-sum games by establishing equivalence of the induced infinite-dimensional Markov chain with a finite dimensional one. In the second part, stochastic stability is further specialized to weakly acyclic games.

Aspiration-based Perturbed Learning Automata in Games with Noisy Utility Measurements. Part A: Stochastic Stability in Non-zero-Sum Games

TL;DR

This work addresses distributed multi-agent optimization with noisy payoff measurements in general non-zero-sum games by proposing Aspiration-based Perturbed Learning Automata (APLA), which augments standard perturbed learning with an evolving aspiration level and an aspiration-driven reinforcement term. The main contribution is a stochastic-stability analysis showing that the infinite-dimensional Markov process induced by APLA converges, in the small-noise limit, to a finite-state Markov chain on pure-strategy states, enabling tractable characterization of long-run behavior even under uniformly bounded observation noise. The results extend prior PLA analyses beyond coordination or potential games to general positive-utility non-zero-sum games and provide robustness to noise, with a simulation illustrating a bias toward payoff-dominant equilibria in a Stag-Hunt example. This framework paves the way for Part B’s specialization to weakly-acyclic games, offering practical convergence guarantees for large distributed systems where synchronized exploration is undesirable.

Abstract

Reinforcement-based learning has attracted considerable attention both in modeling human behavior as well as in engineering, for designing measurement- or payoff-based optimization schemes. Such learning schemes exhibit several advantages, especially in relation to filtering out noisy observations. However, they may exhibit several limitations when applied in a distributed setup. In multi-player weakly-acyclic games, and when each player applies an independent copy of the learning dynamics, convergence to (usually desirable) pure Nash equilibria cannot be guaranteed. Prior work has only focused on a small class of games, namely potential and coordination games. To address this main limitation, this paper introduces a novel payoff-based learning scheme for distributed optimization, namely aspiration-based perturbed learning automata (APLA). In this class of dynamics, and contrary to standard reinforcement-based learning schemes, each player's probability distribution for selecting actions is reinforced both by repeated selection and an aspiration factor that captures the player's satisfaction level. We provide a stochastic stability analysis of APLA in multi-player positive-utility games under the presence of noisy observations. This is the first part of the paper that characterizes stochastic stability in generic non-zero-sum games by establishing equivalence of the induced infinite-dimensional Markov chain with a finite dimensional one. In the second part, stochastic stability is further specialized to weakly acyclic games.

Paper Structure

This paper contains 19 sections, 7 theorems, 50 equations, 4 figures, 1 table, 1 algorithm.

Key Result

Proposition 5.1

Both the unperturbed process $P$ ($\lambda=0$) and the perturbed process $P_{\lambda}$ ($\lambda>0$) satisfy the strong-Feller property.

Figures (4)

  • Figure 1: Aspiration factor.
  • Figure 2: A sample evolution of strategy $x_{ij}$ of player $i$ when starting from $x_{ij}(0)=0$ and $\rho_i(0)=u_i(\alpha)$ and action profile $\alpha^+\equiv\alpha'$ is played repeatedly, such that $\alpha_i'=j$. We demonstrate two cases: (a) $u_i(\alpha')<\rho_i(0)\equiv u_i(\alpha)$, i.e., agent $i$ experiences an unsatisfactory reward, and b) $u_i(\alpha')>\rho_i(0)$, i.e., agent $i$ experiences a satisfactory reward.
  • Figure 3: Response of standard perturbed learning automata (PLA) in the coordination game of Table \ref{['Tb:SHG']}(a) with $a=5$, $b=1$, $c=4$, $d=3$.
  • Figure 4: Response of aspiration-based perturbed learning automata (APLA) in the coordination game of Table \ref{['Tb:SHG']}(a) with $a=5$, $b=1$, $c=4$, $d=3$.

Theorems & Definitions (8)

  • Proposition 5.1
  • Definition 5.1: Pure Strategy State
  • Theorem 5.1: Stochastic Stability
  • Proposition 6.1: Constant action selection
  • Proposition 6.2: Convergence to p.s.s.
  • Proposition 6.3: Limiting t.p.f. of unperturbed process
  • Proposition 6.4: i.p.m. of perturbed process
  • Proposition 6.5: Unique i.p.m. of $Q\Pi$