Reinforcement Learning Method for Zero-Sum Linear-Quadratic Stochastic Differential Games in Infinite Horizons

Yiyuan Wang

Reinforcement Learning Method for Zero-Sum Linear-Quadratic Stochastic Differential Games in Infinite Horizons

Yiyuan Wang

TL;DR

The paper tackles learning solutions to zero-sum linear-quadratic stochastic differential games in an infinite-horizon setting with unknown dynamics. It develops a reinforcement learning framework that integrates dynamic programming with game-theoretic algebraic Riccati equations, including nested-iteration baselines and three RL variants: on-policy semi-model-based, off-policy semi-model-based, and model-free RL. Under data-rank conditions guaranteeing identifiability, the authors prove convergence to the stabilizing GTARE solution $P^*$ and establish the convergence of inner Lyapunov variables and policy gains. A numerical example confirms feasibility and shows the four methods yield consistent stabilizing solutions, underscoring the approach's potential for robust, parameter-free game-theoretic control in stochastic environments.

Abstract

In this work, we propose, for the first time, a reinforcement learning framework specifically designed for zero-sum linear-quadratic stochastic differential games. This approach offers a generalized solution for scenarios in which accurate system parameters are difficult to obtain, thereby overcoming a key limitation of traditional iterative methods that rely on complete system information. In correspondence with the game-theoretic algebraic Riccati equations associated with the problem, we develop both semi-model-based and model-free reinforcement learning algorithms by combining an iterative solution scheme with dynamic programming principles. Notably, under appropriate rank conditions on data sampling, the convergence of the proposed algorithms is rigorously established through theoretical analysis. Finally, numerical simulations are conducted to verify the effectiveness and feasibility of the proposed method.

Reinforcement Learning Method for Zero-Sum Linear-Quadratic Stochastic Differential Games in Infinite Horizons

TL;DR

and establish the convergence of inner Lyapunov variables and policy gains. A numerical example confirms feasibility and shows the four methods yield consistent stabilizing solutions, underscoring the approach's potential for robust, parameter-free game-theoretic control in stochastic environments.

Abstract

Paper Structure (11 sections, 6 theorems, 77 equations, 4 algorithms)

This paper contains 11 sections, 6 theorems, 77 equations, 4 algorithms.

Introduction
Preliminary
Notation
Zero-Sum Linear-Quadratic Stochastic Differential Games in Infinite Horizons
Reinforcement Learning
Nested Iterative
On-Policy Semi-Model-Based Reinforcement Learning
Off-Policy Semi-Model-Based Reinforcement Learning
Model-Free Reinforcement Learning
Convergence Analysis
Simulation

Key Result

Lemma 3.2

For any $Z^{(k,j+1)}$ and $K_2^{(k,j)}(j=0,1,\cdots)$ generated from alg1:PolicyEvaluation and alg1:PolicyImprovement in Algorithm alg1:Nested_Iteration satisfy

Theorems & Definitions (20)

Definition 2.1
Definition 2.2: Sun2020_book
Definition 2.3: Dragan2013book
Remark 3.1
Lemma 3.2
proof
Remark 3.3
Lemma 3.4
proof
Remark 3.5
...and 10 more

Reinforcement Learning Method for Zero-Sum Linear-Quadratic Stochastic Differential Games in Infinite Horizons

TL;DR

Abstract

Reinforcement Learning Method for Zero-Sum Linear-Quadratic Stochastic Differential Games in Infinite Horizons

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (20)