Aspiration-based Perturbed Learning Automata in Games with Noisy Utility Measurements. Part B: Stochastic Stability in Weakly Acyclic Games
Georgios C. Chasparis
TL;DR
This work advances reinforcement-based learning for distributed multi-agent optimization by introducing Aspiration-based Perturbed Learning Automata (APLA), which augments action reinforcement with an aspiration-based satisfaction level to cope with noisy payoff observations. The authors develop a finite-state Markov-chain framework and Freidlin–Wentzell graph machinery to characterize stochastic stability, deriving conditions under which learning converges (in a weak sense) to the pure Nash equilibria set and, in certain weakly acyclic games, to payoff-dominant equilibria. The analysis hinges on approximating one-step transition probabilities via resistance concepts and connecting improvement paths to W-graphs, yielding explicit criteria for stochastically stable states. Simulations on the Stag-Hunt game illustrate that APLA can reliably select efficient equilibria in noisy environments, in contrast to PLA which may favor risk-dominant outcomes. Overall, the approach broadens convergence guarantees for reinforcement-based learning beyond potential/coordination games and demonstrates practical equilibrium selection in large, decentralized settings.
Abstract
Reinforcement-based learning dynamics may exhibit several limitations when applied in a distributed setup. In (repeatedly-played) multi-player/action strategic-form games, and when each player applies an independent copy of the learning dynamics, convergence to (usually desirable) pure Nash equilibria cannot be guaranteed. Prior work has only focused on a small class of games, namely potential and coordination games. Furthermore, strong convergence guarantees (i.e., almost sure convergence or weak convergence) are mostly restricted to two-player games. To address this main limitation of reinforcement-based learning in repeatedly-played strategic-form games, this paper introduces a novel payoff-based learning scheme for distributed optimization in multi-player/action strategic-form games. We present an extension of perturbed learning automata (PLA), namely aspiration-based perturbed learning automata (APLA), in which each player's probability distribution for selecting actions is reinforced both by repeated selection and an aspiration factor that captures the player's satisfaction level. We provide a stochastic stability analysis of APLA in multi-player positive-utility games under the presence of noisy observations. This paper is the second part of this study that analyzes stochastic stability in multi-player/action weakly-acyclic games in the presence of noisy observations. We provide conditions under which convergence is attained (in weak sense) to the set of pure Nash equilibria and payoff-dominant equilibria. To the best of our knowledge, this is the first reinforcement-based learning scheme that addresses convergence in weakly-acyclic games. Lastly, we provide a specialization of the results to the classical Stag-Hunt game, supported by a simulation study.
