Table of Contents
Fetching ...

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part I

Yi Tian, Kaiqing Zhang, Russ Tedrake, Suvrit Sra

TL;DR

Finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model, for finite-horizon time-varying LQG control problems are established.

Abstract

We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a cost-driven approach, where a dynamic model in some latent state space is learned by predicting the costs without predicting the observations or actions. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model, for finite-horizon time-varying LQG control problems. To the best of our knowledge, despite various empirical successes, finite-sample guarantees of such a cost-driven approach remain elusive. Our result underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations. A second part of this work, that is to appear as Part II, addresses the infinite-horizon linear time-invariant setting; it also extends the results to an approach that implicitly learns the latent dynamics, inspired by the recent empirical breakthrough of MuZero in model-based reinforcement learning.

Cost-Driven Representation Learning for Linear Quadratic Gaussian Control: Part I

TL;DR

Finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model, for finite-horizon time-varying LQG control problems are established.

Abstract

We study the task of learning state representations from potentially high-dimensional observations, with the goal of controlling an unknown partially observable system. We pursue a cost-driven approach, where a dynamic model in some latent state space is learned by predicting the costs without predicting the observations or actions. In particular, we focus on an intuitive cost-driven state representation learning method for solving Linear Quadratic Gaussian (LQG) control, one of the most fundamental partially observable control problems. As our main results, we establish finite-sample guarantees of finding a near-optimal state representation function and a near-optimal controller using the directly learned latent model, for finite-horizon time-varying LQG control problems. To the best of our knowledge, despite various empirical successes, finite-sample guarantees of such a cost-driven approach remain elusive. Our result underscores the value of predicting multi-step costs, an idea that is key to our theory, and notably also an idea that is known to be empirically valuable for learning state representations. A second part of this work, that is to appear as Part II, addresses the infinite-horizon linear time-invariant setting; it also extends the results to an approach that implicitly learns the latent dynamics, inspired by the recent empirical breakthrough of MuZero in model-based reinforcement learning.
Paper Structure (17 sections, 19 theorems, 248 equations, 3 algorithms)

This paper contains 17 sections, 19 theorems, 248 equations, 3 algorithms.

Key Result

Proposition 1

Let $(z^{\ast}_t)_{t=0}^{T}$ be state estimates given by the Kalman filter. Then, where $L^{\ast}_{t+1} i_{t+1}$ is independent of $z^{\ast}_t$ and $u_t$, i.e., the state estimates follow the same linear dynamics as the underlying state, with noises $L^{\ast}_{t+1} i_{t+1}$. The cost at step $t$ can then be reformulated as functions of the state estimates by where $b_t > 0$ is a problem-dependen

Theorems & Definitions (40)

  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • Theorem 1
  • Proposition 3
  • proof
  • Lemma 1
  • proof
  • Lemma 2: Quadratic regression
  • ...and 30 more