Table of Contents
Fetching ...

On Passivity, Reinforcement Learning and Higher-Order Learning in Multi-Agent Finite Games

Bolin Gao, Lacra Pavel

TL;DR

It is shown that convergence to a Nash distribution can be attained in a broader class of games than previously considered in the literature—namely, in games characterized by the monotonicity property of their (negative) payoff vectors.

Abstract

In this paper, we propose a passivity-based methodology for analysis and design of reinforcement learning in multi-agent finite games. Starting from a known exponentially-discounted reinforcement learning scheme, we show that convergence to a Nash distribution can be shown in the class of games characterized by the monotonicity property of their (negative) payoff. We further exploit passivity to propose a class of higher-order schemes that preserve convergence properties, can improve the speed of convergence and can even converge in cases whereby their first-order counterpart fail to converge. We demonstrate these properties through numerical simulations for several representative games.

On Passivity, Reinforcement Learning and Higher-Order Learning in Multi-Agent Finite Games

TL;DR

It is shown that convergence to a Nash distribution can be attained in a broader class of games than previously considered in the literature—namely, in games characterized by the monotonicity property of their (negative) payoff vectors.

Abstract

In this paper, we propose a passivity-based methodology for analysis and design of reinforcement learning in multi-agent finite games. Starting from a known exponentially-discounted reinforcement learning scheme, we show that convergence to a Nash distribution can be shown in the class of games characterized by the monotonicity property of their (negative) payoff. We further exploit passivity to propose a class of higher-order schemes that preserve convergence properties, can improve the speed of convergence and can even converge in cases whereby their first-order counterpart fail to converge. We demonstrate these properties through numerical simulations for several representative games.

Paper Structure

This paper contains 14 sections, 5 theorems, 69 equations, 19 figures, 1 table.

Key Result

Proposition 1

Any $\overline{x}^\star =\bm{\sigma}(\overline{z}^\star)$, where $\overline{z}^\star$ is a rest point of Exponential_Discount_Dynamic_2_Overall, is a Nash equilibrium of game $\mathcal{G}$ with perturbed payoff, where $\mathbf{log}^p(x^p) = ^\top$, $\epsilon >0$.

Figures (19)

  • Figure 1: EXP-D-RL \ref{['eqn:first_order_exponentially_discounted_dynamics_system']} as feedback interconnection
  • Figure 2: H-EXP-D-RL \ref{['eqn:closed_loop_feedback_system_2_p']} as feedback interconnection
  • Figure 3: Standard RPS game, $l = 1$, $\epsilon=1$
  • Figure 4: Unstable RPS game, $l = 2.5$, $\epsilon=1$
  • Figure 5: Unstable RPS game, $l = 5$, $\epsilon=1$
  • ...and 14 more figures

Theorems & Definitions (31)

  • Definition 1
  • Definition 2
  • Remark 1
  • Remark 2
  • Proposition 1
  • proof
  • Remark 3
  • Proposition 2
  • proof
  • Remark 4
  • ...and 21 more