Table of Contents
Fetching ...

Convergence Analysis of Gradient-Based Learning with Non-Uniform Learning Rates in Non-Cooperative Multi-Agent Settings

Benjamin Chasnov, Lillian J. Ratliff, Eric Mazumdar, Samuel A. Burden

TL;DR

It is found that much like preconditioning in optimization, non-uniform learning rates cause a distortion in the vector field which can, in turn, change the rate of convergence and the shape of the region of attraction.

Abstract

Considering a class of gradient-based multi-agent learning algorithms in non-cooperative settings, we provide local convergence guarantees to a neighborhood of a stable local Nash equilibrium. In particular, we consider continuous games where agents learn in (i) deterministic settings with oracle access to their gradient and (ii) stochastic settings with an unbiased estimator of their gradient. Utilizing the minimum and maximum singular values of the game Jacobian, we provide finite-time convergence guarantees in the deterministic case. On the other hand, in the stochastic case, we provide concentration bounds guaranteeing that with high probability agents will converge to a neighborhood of a stable local Nash equilibrium in finite time. Different than other works in this vein, we also study the effects of non-uniform learning rates on the learning dynamics and convergence rates. We find that much like preconditioning in optimization, non-uniform learning rates cause a distortion in the vector field which can, in turn, change the rate of convergence and the shape of the region of attraction. The analysis is supported by numerical examples that illustrate different aspects of the theory. We conclude with discussion of the results and open questions.

Convergence Analysis of Gradient-Based Learning with Non-Uniform Learning Rates in Non-Cooperative Multi-Agent Settings

TL;DR

It is found that much like preconditioning in optimization, non-uniform learning rates cause a distortion in the vector field which can, in turn, change the rate of convergence and the shape of the region of attraction.

Abstract

Considering a class of gradient-based multi-agent learning algorithms in non-cooperative settings, we provide local convergence guarantees to a neighborhood of a stable local Nash equilibrium. In particular, we consider continuous games where agents learn in (i) deterministic settings with oracle access to their gradient and (ii) stochastic settings with an unbiased estimator of their gradient. Utilizing the minimum and maximum singular values of the game Jacobian, we provide finite-time convergence guarantees in the deterministic case. On the other hand, in the stochastic case, we provide concentration bounds guaranteeing that with high probability agents will converge to a neighborhood of a stable local Nash equilibrium in finite time. Different than other works in this vein, we also study the effects of non-uniform learning rates on the learning dynamics and convergence rates. We find that much like preconditioning in optimization, non-uniform learning rates cause a distortion in the vector field which can, in turn, change the rate of convergence and the shape of the region of attraction. The analysis is supported by numerical examples that illustrate different aspects of the theory. We conclude with discussion of the results and open questions.

Paper Structure

This paper contains 24 sections, 17 theorems, 64 equations, 4 figures.

Key Result

Proposition 1

If $x$ is a local Nash equilibrium of the game $(f_1, \ldots, f_n)$, then $\omega(x)=0$ and $D_{i}^2f_i(x)\geq 0$. On the other hand, if $\omega(x)=0$ and $D_{i}^2f_i(x)>0$, then $x\in X$ is a local Nash equilibrium.

Figures (4)

  • Figure 1: Convergence of policy gradient in LQ dynamic games to the Nash policy. (a) Each player's linear feedback gain matrix $K_i$ converges to the unique Nash policies (dotted lines). (b) The black dashed line shows upper bound of the number of iterations required to converge within $\varepsilon$ distance from Nash (2-norm). The actual convergence for this random initialization is shown as the solid line.
  • Figure 2: Gradient dynamics of the matching pennies game where agents learning have different learning rates. The vector field of the gradient dynamics are stretched along the faster agent's coordinate.
  • Figure 3: The effects of non-uniform learning rates on the path of convergence to the equilibria. The zero lines for each player ($D_1f_1=0$ or $D_2f_2=0$) are plotted as the diagonal and curved lines, and the two stable Nash equilibria as circles (where $D^2_1 f_1 > 0$ and $D^2_2 f_2>0$). (a) In the deterministic setting, the region of attractions for each equilibrium can be computed numerically. Four scenarios are shown, with a combination of fast and slow agents. The region of attractions for each Nash equilibrium are warped under different learning rates. (b) In the stochastic setting, the samples (in black) approximate the singularly perturbed differential equation (in red). Two initializations and learning rate configurations are plotted.
  • Figure 4: Minimum-fuel particle avoidance control example. (a) Each particle seeks to reach the opposite side of the circle using minimum fuel while avoiding each other. The circles represent the approximate boundaries around each particle at time $t=5$. (b) The joint strategy $x=({\bf u}_1, \cdots, {\bf u}_4)$ is initialized to the minimum fuel solution ignoring interaction between particles. (c) Equilibrium solution achieved by setting the blue agent to have a slower learning rate. (d) Another equilibrium, where the red agent has the slower learning rate.

Theorems & Definitions (25)

  • Definition 1
  • Definition 2
  • Proposition 1: ratliff:2016aa
  • Definition 3: ratliff:2016aa
  • Proposition 2
  • Theorem 1
  • Proposition 3
  • Proposition 4
  • Remark 1
  • Theorem 2
  • ...and 15 more