Latent Linear Quadratic Regulator for Robotic Control Tasks

Yuan Zhang; Shaohui Yang; Toshiyuki Ohtsuka; Colin Jones; Joschka Boedecker

Latent Linear Quadratic Regulator for Robotic Control Tasks

Yuan Zhang, Shaohui Yang, Toshiyuki Ohtsuka, Colin Jones, Joschka Boedecker

TL;DR

LaLQR addresses the computational burden of Model Predictive Control for nonlinear robotic systems by learning a latent representation $z_h=\phi(x_h)$ in which the dynamics are linear $z_{h+1}=A z_h + B u_h$ and the cost is quadratic $z_h^T Q z_h + u_h^T R u_h$, with a monotonic mapping $F$ connecting to the original cost. Parameters $(A,B,Q,R,\phi,F)$ are learned by imitation of nonlinear MPC using two losses: a consistency loss $\mathcal{L}_{cons}$ enforcing $\phi(x_{h+1}) \approx A\phi(x_h) + B u_h$ and a cost loss $\mathcal{L}_{cost}$ aligning $c(x_h,u_h)$ with $F( z_h^T Q z_h + u_h^T R u_h)$. An infinite-horizon LQR on the latent space yields a precomputed gain $K$, enabling online control via $u_h = -K z_h$. Experiments on four MuJoCo robots show LaLQR achieves competitive control performance with significantly lower online computation than full nonlinear MPC and demonstrates improved generalization over standard imitation learning, with ablations confirming the importance of latent companion structure and the cost loss.

Abstract

Model predictive control (MPC) has played a more crucial role in various robotic control tasks, but its high computational requirements are concerning, especially for nonlinear dynamical models. This paper presents a $\textbf{la}$tent $\textbf{l}$inear $\textbf{q}$uadratic $\textbf{r}$egulator (LaLQR) that maps the state space into a latent space, on which the dynamical model is linear and the cost function is quadratic, allowing the efficient application of LQR. We jointly learn this alternative system by imitating the original MPC. Experiments show LaLQR's superior efficiency and generalization compared to other baselines.

Latent Linear Quadratic Regulator for Robotic Control Tasks

TL;DR

LaLQR addresses the computational burden of Model Predictive Control for nonlinear robotic systems by learning a latent representation

in which the dynamics are linear

and the cost is quadratic

, with a monotonic mapping

connecting to the original cost. Parameters

are learned by imitation of nonlinear MPC using two losses: a consistency loss

enforcing

and a cost loss

aligning

with

. An infinite-horizon LQR on the latent space yields a precomputed gain

, enabling online control via

. Experiments on four MuJoCo robots show LaLQR achieves competitive control performance with significantly lower online computation than full nonlinear MPC and demonstrates improved generalization over standard imitation learning, with ablations confirming the importance of latent companion structure and the cost loss.

Abstract

tent

inear

uadratic

egulator (LaLQR) that maps the state space into a latent space, on which the dynamical model is linear and the cost function is quadratic, allowing the efficient application of LQR. We jointly learn this alternative system by imitating the original MPC. Experiments show LaLQR's superior efficiency and generalization compared to other baselines.

Paper Structure (31 sections, 1 theorem, 4 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 31 sections, 1 theorem, 4 equations, 7 figures, 4 tables, 1 algorithm.

Introduction
Background
Model Predictive Control
Linear Predictors for Nonlinear Controlled Systems via Koopman
Latent Linear Model Predictive Control
Latent Linear Quadratic Problem
Linear Quadratic Regulator on Latent Space
Learning System Parameters
Stability Analysis
Experiments
Experimental Setup
Main Results
Generalization
Ablation Study
Related Work
...and 16 more sections

Key Result

Proposition 1

The local linear autonomous system $\delta x_{h+1} = C \delta x_h$ and the latent linear autonomous system $\frac{\partial \phi}{\partial x}|_{x=x_T} \delta x_{h+1} = D \frac{\partial \phi}{\partial x}|_{x=x_T} \delta x_h$ are equivalent at the stable point $x_T$ if and only if for any matrix $C$

Figures (7)

Figure 1: Visualization of two types of dynamical models. $f_L$ means the linear dynamical model.
Figure 2: Training curve of eigen loss on $\textit{cartpole}$ task. The x-axis is the training time steps and the y-axis is the average eigen loss (in log scale) during training.
Figure 3: Visualization of robots used in the experiments, with increased complexity.
Figure 4: Control process of methods learned from imperfect experts. The x-axis is the real testing time, and the y-axis represents the partial state, control and cost in $\textit{cartpole}$ task.
Figure 5: Control process of methods starting from unseen initial states in training. The x-axis is the real testing time, and the y-axis represents the partial state, control and cost in $\textit{cartpole}$ task.
...and 2 more figures

Theorems & Definitions (1)

Proposition 1: Eigenvalues and eigenvectors of equivalent systems

Latent Linear Quadratic Regulator for Robotic Control Tasks

TL;DR

Abstract

Latent Linear Quadratic Regulator for Robotic Control Tasks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (1)