Table of Contents
Fetching ...

An Online Learning Approach for Two-Player Zero-Sum Linear Quadratic Games

Shanting Wang, Weihao Sun, Andreas A. Malikopoulos

Abstract

In this paper, we present an online learning approach for two-player zero-sum linear quadratic games with unknown dynamics. We develop a framework combining regularized least squares model estimation, high probability confidence sets, and surrogate model selection to maintain a regular model for policy updates. We apply a shrinkage step at each episode to identify a surrogate model in the region where the generalized algebraic Riccati equation admits a stabilizing saddle point solution. We then establish regret analysis on algorithm convergence, followed by a numerical example to illustrate the convergence performance and verify the regret analysis.

An Online Learning Approach for Two-Player Zero-Sum Linear Quadratic Games

Abstract

In this paper, we present an online learning approach for two-player zero-sum linear quadratic games with unknown dynamics. We develop a framework combining regularized least squares model estimation, high probability confidence sets, and surrogate model selection to maintain a regular model for policy updates. We apply a shrinkage step at each episode to identify a surrogate model in the region where the generalized algebraic Riccati equation admits a stabilizing saddle point solution. We then establish regret analysis on algorithm convergence, followed by a numerical example to illustrate the convergence performance and verify the regret analysis.

Paper Structure

This paper contains 16 sections, 5 theorems, 53 equations, 2 figures, 1 algorithm.

Key Result

Lemma 1

In the presence of disturbance term in eq:dynamics, with probability at least $1-\delta$, we have $\theta_\star \in \mathcal{C}_k(\delta)$ for all episodes $k \ge 1$, where with the radius of the confidence ellipsoid $\blacktriangleleft$$\blacktriangleleft$

Figures (2)

  • Figure E1: Theta estimation error (left) and gain error for both players (right).
  • Figure E2: Comparison of $\widetilde{\theta}$ and $\widehat{\theta}$ (left), and regret convergence (right).

Theorems & Definitions (11)

  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Theorem 1
  • proof
  • Lemma 3
  • Theorem 2
  • proof
  • proof
  • ...and 1 more