Regularization for Covariance Parameterization of Direct Data-Driven LQR Control
Feiran Zhao, Alessandro Chiuso, Florian Dörfler
TL;DR
This paper tackles direct data-driven LQR design using covariance-based parameterization, addressing the observed trade-off between robust closed-loop stability and optimal performance under data noise. It introduces a regularizer $\Omega(V)=\mathrm{Tr}(V\Sigma V^T\Phi)$ with coefficient $\lambda$ (scaling as $1/\sqrt{t}$) that accounts for uncertainty in both the steady-state covariance $\Sigma$ and the LQR cost, enabling either exploitation ($\lambda>0$) or exploration ($\lambda<0$). The authors reformulate the regularized problem as a convex SDP via a change of variables, yielding a data-length independent optimization that can be used for online/adaptive control. Simulations on a benchmark LQR system show that the regularized covariance-parameterized LQR significantly reduces the optimality gap and increases the likelihood of stabilizing the closed-loop compared to the certainty-equivalence approach.
Abstract
As the benchmark of data-driven control methods, the linear quadratic regulator (LQR) problem has gained significant attention. A growing trend is direct LQR design, which finds the optimal LQR gain directly from raw data and bypassing system identification. To achieve this, our previous work develops a direct LQR formulation parameterized by sample covariance. In this paper, we propose a regularization method for the covariance-parameterized LQR. We show that the regularizer accounts for the uncertainty in both the steady-state covariance matrix corresponding to closed-loop stability, and the LQR cost function corresponding to averaged control performance. With a positive or negative coefficient, the regularizer can be interpreted as promoting either exploitation or exploration, which are well-known trade-offs in reinforcement learning. In simulations, we observe that our covariance-parameterized LQR with regularization can significantly outperform the certainty-equivalence LQR in terms of both the optimality gap and the robust closed-loop stability.
