Reinforcement Learning Method for Zero-Sum Linear-Quadratic Stochastic Differential Games in Infinite Horizons
Yiyuan Wang
TL;DR
The paper tackles learning solutions to zero-sum linear-quadratic stochastic differential games in an infinite-horizon setting with unknown dynamics. It develops a reinforcement learning framework that integrates dynamic programming with game-theoretic algebraic Riccati equations, including nested-iteration baselines and three RL variants: on-policy semi-model-based, off-policy semi-model-based, and model-free RL. Under data-rank conditions guaranteeing identifiability, the authors prove convergence to the stabilizing GTARE solution $P^*$ and establish the convergence of inner Lyapunov variables and policy gains. A numerical example confirms feasibility and shows the four methods yield consistent stabilizing solutions, underscoring the approach's potential for robust, parameter-free game-theoretic control in stochastic environments.
Abstract
In this work, we propose, for the first time, a reinforcement learning framework specifically designed for zero-sum linear-quadratic stochastic differential games. This approach offers a generalized solution for scenarios in which accurate system parameters are difficult to obtain, thereby overcoming a key limitation of traditional iterative methods that rely on complete system information. In correspondence with the game-theoretic algebraic Riccati equations associated with the problem, we develop both semi-model-based and model-free reinforcement learning algorithms by combining an iterative solution scheme with dynamic programming principles. Notably, under appropriate rank conditions on data sampling, the convergence of the proposed algorithms is rigorously established through theoretical analysis. Finally, numerical simulations are conducted to verify the effectiveness and feasibility of the proposed method.
