Aspiration-based Perturbed Learning Automata in Games with Noisy Utility Measurements. Part A: Stochastic Stability in Non-zero-Sum Games
Georgios C. Chasparis
TL;DR
This work addresses distributed multi-agent optimization with noisy payoff measurements in general non-zero-sum games by proposing Aspiration-based Perturbed Learning Automata (APLA), which augments standard perturbed learning with an evolving aspiration level and an aspiration-driven reinforcement term. The main contribution is a stochastic-stability analysis showing that the infinite-dimensional Markov process induced by APLA converges, in the small-noise limit, to a finite-state Markov chain on pure-strategy states, enabling tractable characterization of long-run behavior even under uniformly bounded observation noise. The results extend prior PLA analyses beyond coordination or potential games to general positive-utility non-zero-sum games and provide robustness to noise, with a simulation illustrating a bias toward payoff-dominant equilibria in a Stag-Hunt example. This framework paves the way for Part B’s specialization to weakly-acyclic games, offering practical convergence guarantees for large distributed systems where synchronized exploration is undesirable.
Abstract
Reinforcement-based learning has attracted considerable attention both in modeling human behavior as well as in engineering, for designing measurement- or payoff-based optimization schemes. Such learning schemes exhibit several advantages, especially in relation to filtering out noisy observations. However, they may exhibit several limitations when applied in a distributed setup. In multi-player weakly-acyclic games, and when each player applies an independent copy of the learning dynamics, convergence to (usually desirable) pure Nash equilibria cannot be guaranteed. Prior work has only focused on a small class of games, namely potential and coordination games. To address this main limitation, this paper introduces a novel payoff-based learning scheme for distributed optimization, namely aspiration-based perturbed learning automata (APLA). In this class of dynamics, and contrary to standard reinforcement-based learning schemes, each player's probability distribution for selecting actions is reinforced both by repeated selection and an aspiration factor that captures the player's satisfaction level. We provide a stochastic stability analysis of APLA in multi-player positive-utility games under the presence of noisy observations. This is the first part of the paper that characterizes stochastic stability in generic non-zero-sum games by establishing equivalence of the induced infinite-dimensional Markov chain with a finite dimensional one. In the second part, stochastic stability is further specialized to weakly acyclic games.
