Policy Optimization finds Nash Equilibrium in Regularized General-Sum LQ Games

Muhammad Aneeq uz Zaman; Shubham Aggarwal; Melih Bastopcu; Tamer Başar

Policy Optimization finds Nash Equilibrium in Regularized General-Sum LQ Games

Muhammad Aneeq uz Zaman, Shubham Aggarwal, Melih Bastopcu, Tamer Başar

TL;DR

This work studies relative entropy-regularized general-sum N-agent LQ games, proving that Nash equilibria lie in the linear Gaussian policy class and deriving coupled Riccati equations that determine the NE. It shows that, under a sufficient entropy parameter $\\tau$, the NE is unique and can be computed via a policy optimization algorithm with a receding-horizon structure that converges linearly to the NE. When the entropy condition is not met, a $\\delta$-augmentation yields an $\\epsilon$-NE, with the augmentation affecting the cost and Riccati recursions in a controlled, $\\mathcal{O}(\\delta)$-smooth manner. The results provide a theoretically grounded framework for provable convergence of multi-agent RL methods in general-sum LQ settings and offer practical mechanisms to obtain approximate equilibria when regularization is insufficient.

Abstract

In this paper, we investigate the impact of introducing relative entropy regularization on the Nash Equilibria (NE) of General-Sum $N$-agent games, revealing the fact that the NE of such games conform to linear Gaussian policies. Moreover, it delineates sufficient conditions, contingent upon the adequacy of entropy regularization, for the uniqueness of the NE within the game. As Policy Optimization serves as a foundational approach for Reinforcement Learning (RL) techniques aimed at finding the NE, in this work we prove the linear convergence of a policy optimization algorithm which (subject to the adequacy of entropy regularization) is capable of provably attaining the NE. Furthermore, in scenarios where the entropy regularization proves insufficient, we present a $δ$-augmentation technique, which facilitates the achievement of an $ε$-NE within the game.

Policy Optimization finds Nash Equilibrium in Regularized General-Sum LQ Games

TL;DR

, the NE is unique and can be computed via a policy optimization algorithm with a receding-horizon structure that converges linearly to the NE. When the entropy condition is not met, a

-augmentation yields an

-NE, with the augmentation affecting the cost and Riccati recursions in a controlled,

-smooth manner. The results provide a theoretically grounded framework for provable convergence of multi-agent RL methods in general-sum LQ settings and offer practical mechanisms to obtain approximate equilibria when regularization is insufficient.

Abstract

In this paper, we investigate the impact of introducing relative entropy regularization on the Nash Equilibria (NE) of General-Sum

-agent games, revealing the fact that the NE of such games conform to linear Gaussian policies. Moreover, it delineates sufficient conditions, contingent upon the adequacy of entropy regularization, for the uniqueness of the NE within the game. As Policy Optimization serves as a foundational approach for Reinforcement Learning (RL) techniques aimed at finding the NE, in this work we prove the linear convergence of a policy optimization algorithm which (subject to the adequacy of entropy regularization) is capable of provably attaining the NE. Furthermore, in scenarios where the entropy regularization proves insufficient, we present a

-augmentation technique, which facilitates the achievement of an

-NE within the game.

Paper Structure (6 sections, 6 theorems, 57 equations, 1 algorithm)

This paper contains 6 sections, 6 theorems, 57 equations, 1 algorithm.

Introduction
Problem Formulation
Nash Equilibrium Characterization
Policy Optimization (PO) & Non-Asymptotic Analysis
$\delta$-augmented Entropy-Regularized Game
Conclusion & Future Work

Key Result

Lemma III.1

Suppose that $M \in \mathbb{R}^{p \times p}$ is a positive semi-definite symmetric matrix, $\tau > 0$, $b \in \mathbb{R}^p$ and prior policy $\mu(u) = \mathcal{N}(0,I)$. Then, the probability distribution $\pi(u) \in \mathcal{P}(\mathbb{R}^p)$ which minimizes the following expression, is a multivariate Gaussian distribution, in particular, $\pi(u) = \mathcal{N}(-((\tau/2)I + M)^{-1} b/2, (I + 2M/

Theorems & Definitions (13)

Definition II.1
Lemma III.1
proof
Theorem III.2
proof
Lemma III.4
proof
Lemma IV.1
proof
Theorem IV.2
...and 3 more

Policy Optimization finds Nash Equilibrium in Regularized General-Sum LQ Games

TL;DR

Abstract

Policy Optimization finds Nash Equilibrium in Regularized General-Sum LQ Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (13)