Table of Contents
Fetching ...

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II

Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra

TL;DR

Finite-sample guarantees on finding a near-optimal representation function and a near-optimal controller using the learned latent model for infinite-horizon time-invariant Linear Quadratic Gaussian (LQG) control are established.

Abstract

We study the problem of state representation learning for control from partial and potentially high-dimensional observations. We approach this problem via cost-driven state representation learning, in which we learn a dynamical model in a latent state space by predicting cumulative costs. In particular, we establish finite-sample guarantees on finding a near-optimal representation function and a near-optimal controller using the learned latent model for infinite-horizon time-invariant Linear Quadratic Gaussian (LQG) control. We study two approaches to cost-driven representation learning, which differ in whether the transition function of the latent state is learned explicitly or implicitly. The first approach has also been investigated in Part I of this work, for finite-horizon time-varying LQG control. The second approach closely resembles MuZero, a recent breakthrough in empirical reinforcement learning, in that it learns latent dynamics implicitly by predicting cumulative costs. A key technical contribution of this Part II is to prove persistency of excitation for a new stochastic process that arises from the analysis of quadratic regression in our approach, and may be of independent interest.

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part II

TL;DR

Finite-sample guarantees on finding a near-optimal representation function and a near-optimal controller using the learned latent model for infinite-horizon time-invariant Linear Quadratic Gaussian (LQG) control are established.

Abstract

We study the problem of state representation learning for control from partial and potentially high-dimensional observations. We approach this problem via cost-driven state representation learning, in which we learn a dynamical model in a latent state space by predicting cumulative costs. In particular, we establish finite-sample guarantees on finding a near-optimal representation function and a near-optimal controller using the learned latent model for infinite-horizon time-invariant Linear Quadratic Gaussian (LQG) control. We study two approaches to cost-driven representation learning, which differ in whether the transition function of the latent state is learned explicitly or implicitly. The first approach has also been investigated in Part I of this work, for finite-horizon time-varying LQG control. The second approach closely resembles MuZero, a recent breakthrough in empirical reinforcement learning, in that it learns latent dynamics implicitly by predicting cumulative costs. A key technical contribution of this Part II is to prove persistency of excitation for a new stochastic process that arises from the analysis of quadratic regression in our approach, and may be of independent interest.
Paper Structure (16 sections, 10 theorems, 176 equations, 2 algorithms)

This paper contains 16 sections, 10 theorems, 176 equations, 2 algorithms.

Key Result

Proposition 1

Let $z^{\ast}_0$ be the initial state estimate and $(z^{\ast}_t)_{t\ge 1}$ be the state estimates given by the Kalman filter. Then, for $t \ge 0$, where $L^{\ast} i_{t+1}$ is independent of $z^{\ast}_t$ and $u_t$, i.e., the state estimates follow the same linear dynamics with noises $\{L^{\ast} i_{t+1}\}_{t\geq 0}$. The cost at step $t$ can be reformulated as a function of the state estimates by

Theorems & Definitions (19)

  • Proposition 1
  • Proposition 2
  • proof
  • Theorem 1
  • Proposition 3
  • proof
  • Definition 1: Block martingale small-ball (BMSB) condition
  • Lemma 1
  • proof
  • Lemma 2
  • ...and 9 more