Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II

Yi Tian; Kaiqing Zhang; Russ Tedrake; Suvrit Sra

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II

Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra

TL;DR

Finite-sample guarantees on finding a near-optimal representation function and a near-optimal controller using the learned latent model for infinite-horizon time-invariant Linear Quadratic Gaussian (LQG) control are established.

Abstract

We study the problem of state representation learning for control from partial and potentially high-dimensional observations. We approach this problem via cost-driven state representation learning, in which we learn a dynamical model in a latent state space by predicting cumulative costs. In particular, we establish finite-sample guarantees on finding a near-optimal representation function and a near-optimal controller using the learned latent model for infinite-horizon time-invariant Linear Quadratic Gaussian (LQG) control. We study two approaches to cost-driven representation learning, which differ in whether the transition function of the latent state is learned explicitly or implicitly. The first approach has also been investigated in Part I of this work, for finite-horizon time-varying LQG control. The second approach closely resembles MuZero, a recent breakthrough in empirical reinforcement learning, in that it learns latent dynamics implicitly by predicting cumulative costs. A key technical contribution of this Part II is to prove persistency of excitation for a new stochastic process that arises from the analysis of quadratic regression in our approach, and may be of independent interest.

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II

TL;DR

Abstract

Paper Structure (16 sections, 10 theorems, 176 equations, 2 algorithms)

This paper contains 16 sections, 10 theorems, 176 equations, 2 algorithms.

Introduction
Problem setup
Latent model of infinite-horizon time-invariant LQG
Method
Cost-driven representation function learning
Explicit learning of system dynamics
Implicit learning of system dynamics (MuZero-style)
Theoretical guarantees and proofs
Proposition on multi-step cumulative costs
Persistency of excitation
Quadratic regression bound
Perturbed linear regression bound
Stable linear system under small perturbations
Proof of Theorem \ref{['thm:main-poly']}
Additional discussion on MuZero
...and 1 more sections

Key Result

Proposition 1

Let $z^{\ast}_0$ be the initial state estimate and $(z^{\ast}_t)_{t\ge 1}$ be the state estimates given by the Kalman filter. Then, for $t \ge 0$, where $L^{\ast} i_{t+1}$ is independent of $z^{\ast}_t$ and $u_t$, i.e., the state estimates follow the same linear dynamics with noises $\{L^{\ast} i_{t+1}\}_{t\geq 0}$. The cost at step $t$ can be reformulated as a function of the state estimates by

Theorems & Definitions (19)

Proposition 1
Proposition 2
proof
Theorem 1
Proposition 3
proof
Definition 1: Block martingale small-ball (BMSB) condition
Lemma 1
proof
Lemma 2
...and 9 more

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II

TL;DR

Abstract

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (19)