Table of Contents
Fetching ...

Latent Linear Quadratic Regulator for Robotic Control Tasks

Yuan Zhang, Shaohui Yang, Toshiyuki Ohtsuka, Colin Jones, Joschka Boedecker

TL;DR

LaLQR addresses the computational burden of Model Predictive Control for nonlinear robotic systems by learning a latent representation $z_h=\phi(x_h)$ in which the dynamics are linear $z_{h+1}=A z_h + B u_h$ and the cost is quadratic $z_h^T Q z_h + u_h^T R u_h$, with a monotonic mapping $F$ connecting to the original cost. Parameters $(A,B,Q,R,\phi,F)$ are learned by imitation of nonlinear MPC using two losses: a consistency loss $\mathcal{L}_{cons}$ enforcing $\phi(x_{h+1}) \approx A\phi(x_h) + B u_h$ and a cost loss $\mathcal{L}_{cost}$ aligning $c(x_h,u_h)$ with $F( z_h^T Q z_h + u_h^T R u_h)$. An infinite-horizon LQR on the latent space yields a precomputed gain $K$, enabling online control via $u_h = -K z_h$. Experiments on four MuJoCo robots show LaLQR achieves competitive control performance with significantly lower online computation than full nonlinear MPC and demonstrates improved generalization over standard imitation learning, with ablations confirming the importance of latent companion structure and the cost loss.

Abstract

Model predictive control (MPC) has played a more crucial role in various robotic control tasks, but its high computational requirements are concerning, especially for nonlinear dynamical models. This paper presents a $\textbf{la}$tent $\textbf{l}$inear $\textbf{q}$uadratic $\textbf{r}$egulator (LaLQR) that maps the state space into a latent space, on which the dynamical model is linear and the cost function is quadratic, allowing the efficient application of LQR. We jointly learn this alternative system by imitating the original MPC. Experiments show LaLQR's superior efficiency and generalization compared to other baselines.

Latent Linear Quadratic Regulator for Robotic Control Tasks

TL;DR

LaLQR addresses the computational burden of Model Predictive Control for nonlinear robotic systems by learning a latent representation in which the dynamics are linear and the cost is quadratic , with a monotonic mapping connecting to the original cost. Parameters are learned by imitation of nonlinear MPC using two losses: a consistency loss enforcing and a cost loss aligning with . An infinite-horizon LQR on the latent space yields a precomputed gain , enabling online control via . Experiments on four MuJoCo robots show LaLQR achieves competitive control performance with significantly lower online computation than full nonlinear MPC and demonstrates improved generalization over standard imitation learning, with ablations confirming the importance of latent companion structure and the cost loss.

Abstract

Model predictive control (MPC) has played a more crucial role in various robotic control tasks, but its high computational requirements are concerning, especially for nonlinear dynamical models. This paper presents a tent inear uadratic egulator (LaLQR) that maps the state space into a latent space, on which the dynamical model is linear and the cost function is quadratic, allowing the efficient application of LQR. We jointly learn this alternative system by imitating the original MPC. Experiments show LaLQR's superior efficiency and generalization compared to other baselines.
Paper Structure (31 sections, 1 theorem, 4 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 31 sections, 1 theorem, 4 equations, 7 figures, 4 tables, 1 algorithm.

Key Result

Proposition 1

The local linear autonomous system $\delta x_{h+1} = C \delta x_h$ and the latent linear autonomous system $\frac{\partial \phi}{\partial x}|_{x=x_T} \delta x_{h+1} = D \frac{\partial \phi}{\partial x}|_{x=x_T} \delta x_h$ are equivalent at the stable point $x_T$ if and only if for any matrix $C$

Figures (7)

  • Figure 1: Visualization of two types of dynamical models. $f_L$ means the linear dynamical model.
  • Figure 2: Training curve of eigen loss on $\textit{cartpole}$ task. The x-axis is the training time steps and the y-axis is the average eigen loss (in log scale) during training.
  • Figure 3: Visualization of robots used in the experiments, with increased complexity.
  • Figure 4: Control process of methods learned from imperfect experts. The x-axis is the real testing time, and the y-axis represents the partial state, control and cost in $\textit{cartpole}$ task.
  • Figure 5: Control process of methods starting from unseen initial states in training. The x-axis is the real testing time, and the y-axis represents the partial state, control and cost in $\textit{cartpole}$ task.
  • ...and 2 more figures

Theorems & Definitions (1)

  • Proposition 1: Eigenvalues and eigenvectors of equivalent systems