Latent Linear Quadratic Regulator for Robotic Control Tasks
Yuan Zhang, Shaohui Yang, Toshiyuki Ohtsuka, Colin Jones, Joschka Boedecker
TL;DR
LaLQR addresses the computational burden of Model Predictive Control for nonlinear robotic systems by learning a latent representation $z_h=\phi(x_h)$ in which the dynamics are linear $z_{h+1}=A z_h + B u_h$ and the cost is quadratic $z_h^T Q z_h + u_h^T R u_h$, with a monotonic mapping $F$ connecting to the original cost. Parameters $(A,B,Q,R,\phi,F)$ are learned by imitation of nonlinear MPC using two losses: a consistency loss $\mathcal{L}_{cons}$ enforcing $\phi(x_{h+1}) \approx A\phi(x_h) + B u_h$ and a cost loss $\mathcal{L}_{cost}$ aligning $c(x_h,u_h)$ with $F( z_h^T Q z_h + u_h^T R u_h)$. An infinite-horizon LQR on the latent space yields a precomputed gain $K$, enabling online control via $u_h = -K z_h$. Experiments on four MuJoCo robots show LaLQR achieves competitive control performance with significantly lower online computation than full nonlinear MPC and demonstrates improved generalization over standard imitation learning, with ablations confirming the importance of latent companion structure and the cost loss.
Abstract
Model predictive control (MPC) has played a more crucial role in various robotic control tasks, but its high computational requirements are concerning, especially for nonlinear dynamical models. This paper presents a $\textbf{la}$tent $\textbf{l}$inear $\textbf{q}$uadratic $\textbf{r}$egulator (LaLQR) that maps the state space into a latent space, on which the dynamical model is linear and the cost function is quadratic, allowing the efficient application of LQR. We jointly learn this alternative system by imitating the original MPC. Experiments show LaLQR's superior efficiency and generalization compared to other baselines.
