Table of Contents
Fetching ...

Reinforcement Learning Method for Zero-Sum Linear-Quadratic Stochastic Differential Games in Infinite Horizons

Yiyuan Wang

TL;DR

The paper tackles learning solutions to zero-sum linear-quadratic stochastic differential games in an infinite-horizon setting with unknown dynamics. It develops a reinforcement learning framework that integrates dynamic programming with game-theoretic algebraic Riccati equations, including nested-iteration baselines and three RL variants: on-policy semi-model-based, off-policy semi-model-based, and model-free RL. Under data-rank conditions guaranteeing identifiability, the authors prove convergence to the stabilizing GTARE solution $P^*$ and establish the convergence of inner Lyapunov variables and policy gains. A numerical example confirms feasibility and shows the four methods yield consistent stabilizing solutions, underscoring the approach's potential for robust, parameter-free game-theoretic control in stochastic environments.

Abstract

In this work, we propose, for the first time, a reinforcement learning framework specifically designed for zero-sum linear-quadratic stochastic differential games. This approach offers a generalized solution for scenarios in which accurate system parameters are difficult to obtain, thereby overcoming a key limitation of traditional iterative methods that rely on complete system information. In correspondence with the game-theoretic algebraic Riccati equations associated with the problem, we develop both semi-model-based and model-free reinforcement learning algorithms by combining an iterative solution scheme with dynamic programming principles. Notably, under appropriate rank conditions on data sampling, the convergence of the proposed algorithms is rigorously established through theoretical analysis. Finally, numerical simulations are conducted to verify the effectiveness and feasibility of the proposed method.

Reinforcement Learning Method for Zero-Sum Linear-Quadratic Stochastic Differential Games in Infinite Horizons

TL;DR

The paper tackles learning solutions to zero-sum linear-quadratic stochastic differential games in an infinite-horizon setting with unknown dynamics. It develops a reinforcement learning framework that integrates dynamic programming with game-theoretic algebraic Riccati equations, including nested-iteration baselines and three RL variants: on-policy semi-model-based, off-policy semi-model-based, and model-free RL. Under data-rank conditions guaranteeing identifiability, the authors prove convergence to the stabilizing GTARE solution and establish the convergence of inner Lyapunov variables and policy gains. A numerical example confirms feasibility and shows the four methods yield consistent stabilizing solutions, underscoring the approach's potential for robust, parameter-free game-theoretic control in stochastic environments.

Abstract

In this work, we propose, for the first time, a reinforcement learning framework specifically designed for zero-sum linear-quadratic stochastic differential games. This approach offers a generalized solution for scenarios in which accurate system parameters are difficult to obtain, thereby overcoming a key limitation of traditional iterative methods that rely on complete system information. In correspondence with the game-theoretic algebraic Riccati equations associated with the problem, we develop both semi-model-based and model-free reinforcement learning algorithms by combining an iterative solution scheme with dynamic programming principles. Notably, under appropriate rank conditions on data sampling, the convergence of the proposed algorithms is rigorously established through theoretical analysis. Finally, numerical simulations are conducted to verify the effectiveness and feasibility of the proposed method.
Paper Structure (11 sections, 6 theorems, 77 equations, 4 algorithms)

This paper contains 11 sections, 6 theorems, 77 equations, 4 algorithms.

Key Result

Lemma 3.2

For any $Z^{(k,j+1)}$ and $K_2^{(k,j)}(j=0,1,\cdots)$ generated from alg1:PolicyEvaluation and alg1:PolicyImprovement in Algorithm alg1:Nested_Iteration satisfy

Theorems & Definitions (20)

  • Definition 2.1
  • Definition 2.2: Sun2020_book
  • Definition 2.3: Dragan2013book
  • Remark 3.1
  • Lemma 3.2
  • proof
  • Remark 3.3
  • Lemma 3.4
  • proof
  • Remark 3.5
  • ...and 10 more