Table of Contents
Fetching ...

Reinforcement Learning for Discrete-time LQG Mean Field Social Control Problems with Unknown Dynamics

Hanfang Zhang, Bing-Chang Wang, Shuo Chen

TL;DR

<3-5 sentence high-level summary>

Abstract

This paper studies the discrete-time linear-quadratic-Gaussian mean field (MF) social control problem in an infinite horizon, where the dynamics of all agents are unknown. The objective is to design a reinforcement learning (RL) algorithm to approximate the decentralized asymptotic optimal social control in terms of two algebraic Riccati equations (AREs). In this problem, a coupling term is introduced into the system dynamics to capture the interactions among agents. This causes the equivalence between model-based and model-free methods to be invalid, which makes it difficult to directly apply traditional model-free algorithms. Firstly, under the assumptions of system stabilizability and detectability, a model-based policy iteration algorithm is proposed to approximate the stabilizing solution of the AREs. The algorithm is proven to be convergent in both cases of semi-positive definite and indefinite weight matrices. Subsequently, by adopting the method of system transformation, a model-free RL algorithm is designed to solve for asymptotic optimal social control. During the iteration process, the updates are performed using data collected from any two agents and MF state. Finally, a numerical case is provided to verify the effectiveness of the proposed algorithm.

Reinforcement Learning for Discrete-time LQG Mean Field Social Control Problems with Unknown Dynamics

TL;DR

<3-5 sentence high-level summary>

Abstract

This paper studies the discrete-time linear-quadratic-Gaussian mean field (MF) social control problem in an infinite horizon, where the dynamics of all agents are unknown. The objective is to design a reinforcement learning (RL) algorithm to approximate the decentralized asymptotic optimal social control in terms of two algebraic Riccati equations (AREs). In this problem, a coupling term is introduced into the system dynamics to capture the interactions among agents. This causes the equivalence between model-based and model-free methods to be invalid, which makes it difficult to directly apply traditional model-free algorithms. Firstly, under the assumptions of system stabilizability and detectability, a model-based policy iteration algorithm is proposed to approximate the stabilizing solution of the AREs. The algorithm is proven to be convergent in both cases of semi-positive definite and indefinite weight matrices. Subsequently, by adopting the method of system transformation, a model-free RL algorithm is designed to solve for asymptotic optimal social control. During the iteration process, the updates are performed using data collected from any two agents and MF state. Finally, a numerical case is provided to verify the effectiveness of the proposed algorithm.

Paper Structure

This paper contains 10 sections, 93 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Algorithm logic diagram
  • Figure 2: Real-time data collected from agent 1 and 2
  • Figure 3: MF state trajectory comparison
  • Figure 4: $\{\hat{P}, \hat{K},\hat{\Lambda}^{1}\}$ and $\{\hat{\Pi}, \hat{\bar{K}},\hat{\Lambda}^{2}\}$ of Algorithm \ref{['alg_01']}
  • Figure 5: Relative estimation errors versus $\gamma$

Theorems & Definitions (10)

  • proof
  • proof
  • proof
  • proof
  • proof
  • proof
  • proof : Proof of Lemma \ref{['lem02']}
  • proof : Proof of Theorem \ref{['thm01']}
  • proof : Proof of Theorem \ref{['thm02']}
  • proof : Proof of Theorem \ref{['thm03']}