Table of Contents
Fetching ...

Linear Convergence in Games with Delayed Feedback via Extra Prediction

Yuma Fujimoto, Kenshi Abe, Kaito Ariu

TL;DR

It is validates that extra optimism is a promising countermeasure against performance degradation caused by feedback delays, and derives the rate of linear convergence of Weighted Optimistic Gradient Descent-Ascent (WOGDA), which predicts future rewards with extra optimism, in unconstrained bilinear games.

Abstract

Feedback delays are inevitable in real-world multi-agent learning. They are known to severely degrade performance, and the convergence rate under delayed feedback is still unclear, even for bilinear games. This paper derives the rate of linear convergence of Weighted Optimistic Gradient Descent-Ascent (WOGDA), which predicts future rewards with extra optimism, in unconstrained bilinear games. To analyze the algorithm, we interpret it as an approximation of the Extra Proximal Point (EPP), which is updated based on farther future rewards than the classical Proximal Point (PP). Our theorems show that standard optimism (predicting the next-step reward) achieves linear convergence to the equilibrium at a rate $\exp(-Θ(t/m^{5}))$ after $t$ iterations for delay $m$. Moreover, employing extra optimism (predicting farther future reward) tolerates a larger step size and significantly accelerates the rate to $\exp(-Θ(t/(m^{2}\log m)))$. Our experiments also show accelerated convergence driven by the extra optimism and are qualitatively consistent with our theorems. In summary, this paper validates that extra optimism is a promising countermeasure against performance degradation caused by feedback delays.

Linear Convergence in Games with Delayed Feedback via Extra Prediction

TL;DR

It is validates that extra optimism is a promising countermeasure against performance degradation caused by feedback delays, and derives the rate of linear convergence of Weighted Optimistic Gradient Descent-Ascent (WOGDA), which predicts future rewards with extra optimism, in unconstrained bilinear games.

Abstract

Feedback delays are inevitable in real-world multi-agent learning. They are known to severely degrade performance, and the convergence rate under delayed feedback is still unclear, even for bilinear games. This paper derives the rate of linear convergence of Weighted Optimistic Gradient Descent-Ascent (WOGDA), which predicts future rewards with extra optimism, in unconstrained bilinear games. To analyze the algorithm, we interpret it as an approximation of the Extra Proximal Point (EPP), which is updated based on farther future rewards than the classical Proximal Point (PP). Our theorems show that standard optimism (predicting the next-step reward) achieves linear convergence to the equilibrium at a rate after iterations for delay . Moreover, employing extra optimism (predicting farther future reward) tolerates a larger step size and significantly accelerates the rate to . Our experiments also show accelerated convergence driven by the extra optimism and are qualitatively consistent with our theorems. In summary, this paper validates that extra optimism is a promising countermeasure against performance degradation caused by feedback delays.
Paper Structure (38 sections, 7 theorems, 51 equations, 4 figures)

This paper contains 38 sections, 7 theorems, 51 equations, 4 figures.

Key Result

Theorem 4.1

Suppose $n=1$ and $\eta=1/(56(m+1)^{2}\kappa^{2}\lambda_{\max})$, then it holds with some positive constant $c(>0)$. Here, $\|\tilde{\boldsymbol{z}}_{0}\|$ is defined as

Figures (4)

  • Figure 1: Convergence in Matching Pennies with delayed feedback. We set the delay as $m=10$ and the initial state as $(\boldsymbol{x}_{0},\boldsymbol{y}_{0})=({\bf c}/2,\boldsymbol{0})$. In both panels, the solid lines are the trajectories on the plane of $<\boldsymbol{x}_{t},{\bf c}>$ and $<\boldsymbol{y}_{t},{\bf c}>$. The black star markers are the Nash equilibria which satisfy $<\boldsymbol{x}_{*},{\bf c}>=<\boldsymbol{y}_{*},{\bf c}>=0$. In A, the extra-optimistic weight is set to the minimum necessary value, $n=1$. In B, it is to $n=m/2+1=6$. The step size is respectively fine-tuned as $\eta=10^{-1.95}$ in A and $\eta=10^{-1.66}$ in B. The black cross marks indicate the first $20$ steps of learning, which visualize that learning proceeds more quickly in B than in A. The trajectory also requires fewer cycles until convergence in B than in A.
  • Figure 1: Flowchart of the lemmas and theorems. In §\ref{['ss_epp']}, Lemma \ref{['lem_dynamics_epp']} captures the dynamics of EPP, and \ref{['lem_analysis_epp']} analyzes them. In §\ref{['ss_error']}, Lemma \ref{['lem_dynamics_error']} captures the dynamics of the error between WOGDA and EPP, and Lemma \ref{['lem_evaluation_error']} analyzes them. In §\ref{['ss_combination']}, Theorem \ref{['thm_convergence_wogda']} analyzes WOGDA by combining the convergence term by EPP and the error term. It leads to Theorem \ref{['thm_convergence_pp']}, providing the rate of linear convergence by next-step prediction, and Theorem \ref{['thm_convergence_epp']}, providing the rate of linear convergence by extra prediction.
  • Figure 2: Optimal step size ( A) and optimal linear convergence rate ( B) in Matching Pennies. The circle and triangle markers indicate $n=1$ (next-step prediction) and $n=m/2+1$ (extra prediction), respectively. We consider the delays from $m=2$ (red) to $m=80$ (purple). In A, the horizontal and vertical axes indicate logarithmic delay $\log_{10}(m+1)$ and logarithmic optimal step size $\log_{10}\eta^{*}$. The gray broken lines fit the circle markers with slope $-3/2$ and the triangle markers with slope $-1$, respectively. This means that the extra prediction admits a larger step size than the next-step prediction. In B, the vertical axis indicates logarithmic linear convergence rate $\log_{10}(1-\rm{eLCR}^{*})$. The gray broken lines fit the circle markers with slope $-3$ and the triangle markers with slope $-1$, respectively. This means that the extra prediction converges faster than the next-step prediction.
  • Figure 3: Optimal step size ( A) and optimal linear convergence rate ( B) in $5\times 5$ random matrix game. How to read the panels is the same as Figure \ref{['F02']}. The initial state is $(\boldsymbol{x}_{0},\boldsymbol{y}_{0})=(\boldsymbol{1},\boldsymbol{0})$.

Theorems & Definitions (11)

  • Theorem 4.1: Linear Convergence by Next-Step Prediction
  • Theorem 4.2: Linear Convergence by Extra Prediction
  • Lemma 5.1: Dynamics of EPP
  • Lemma 5.2: Analysis of EPP
  • proof
  • Lemma 5.3: Dynamics of Error Term
  • proof
  • Lemma 5.4: Evaluation of Error Term
  • Theorem 5.5: Analysis of WOGDA
  • proof
  • ...and 1 more