Table of Contents
Fetching ...

Regularization for Covariance Parameterization of Direct Data-Driven LQR Control

Feiran Zhao, Alessandro Chiuso, Florian Dörfler

TL;DR

This paper tackles direct data-driven LQR design using covariance-based parameterization, addressing the observed trade-off between robust closed-loop stability and optimal performance under data noise. It introduces a regularizer $\Omega(V)=\mathrm{Tr}(V\Sigma V^T\Phi)$ with coefficient $\lambda$ (scaling as $1/\sqrt{t}$) that accounts for uncertainty in both the steady-state covariance $\Sigma$ and the LQR cost, enabling either exploitation ($\lambda>0$) or exploration ($\lambda<0$). The authors reformulate the regularized problem as a convex SDP via a change of variables, yielding a data-length independent optimization that can be used for online/adaptive control. Simulations on a benchmark LQR system show that the regularized covariance-parameterized LQR significantly reduces the optimality gap and increases the likelihood of stabilizing the closed-loop compared to the certainty-equivalence approach.

Abstract

As the benchmark of data-driven control methods, the linear quadratic regulator (LQR) problem has gained significant attention. A growing trend is direct LQR design, which finds the optimal LQR gain directly from raw data and bypassing system identification. To achieve this, our previous work develops a direct LQR formulation parameterized by sample covariance. In this paper, we propose a regularization method for the covariance-parameterized LQR. We show that the regularizer accounts for the uncertainty in both the steady-state covariance matrix corresponding to closed-loop stability, and the LQR cost function corresponding to averaged control performance. With a positive or negative coefficient, the regularizer can be interpreted as promoting either exploitation or exploration, which are well-known trade-offs in reinforcement learning. In simulations, we observe that our covariance-parameterized LQR with regularization can significantly outperform the certainty-equivalence LQR in terms of both the optimality gap and the robust closed-loop stability.

Regularization for Covariance Parameterization of Direct Data-Driven LQR Control

TL;DR

This paper tackles direct data-driven LQR design using covariance-based parameterization, addressing the observed trade-off between robust closed-loop stability and optimal performance under data noise. It introduces a regularizer with coefficient (scaling as ) that accounts for uncertainty in both the steady-state covariance and the LQR cost, enabling either exploitation () or exploration (). The authors reformulate the regularized problem as a convex SDP via a change of variables, yielding a data-length independent optimization that can be used for online/adaptive control. Simulations on a benchmark LQR system show that the regularized covariance-parameterized LQR significantly reduces the optimality gap and increases the likelihood of stabilizing the closed-loop compared to the certainty-equivalence approach.

Abstract

As the benchmark of data-driven control methods, the linear quadratic regulator (LQR) problem has gained significant attention. A growing trend is direct LQR design, which finds the optimal LQR gain directly from raw data and bypassing system identification. To achieve this, our previous work develops a direct LQR formulation parameterized by sample covariance. In this paper, we propose a regularization method for the covariance-parameterized LQR. We show that the regularizer accounts for the uncertainty in both the steady-state covariance matrix corresponding to closed-loop stability, and the LQR cost function corresponding to averaged control performance. With a positive or negative coefficient, the regularizer can be interpreted as promoting either exploitation or exploration, which are well-known trade-offs in reinforcement learning. In simulations, we observe that our covariance-parameterized LQR with regularization can significantly outperform the certainty-equivalence LQR in terms of both the optimality gap and the robust closed-loop stability.

Paper Structure

This paper contains 12 sections, 1 theorem, 35 equations, 1 figure, 1 table.

Key Result

Lemma 1

Under Assumption assumption:noise, it holds that $\mathbb{E}\left[\overline{W}_0\right] = 0$ and $\text{Var}\left[\text{vec}(\overline{W}_0)\right] = I_n \otimes \Phi_t/t$.

Figures (1)

  • Figure 1: Performance of the regularized covariance-parameterized LQR \ref{['prob:regu']} as a function of $\lambda$. The red line represents percentage of stabilizing solution from $1000$ independent trials, and the blue line represents the median of optimality gap \ref{['equ:error']}. The case $\lambda = 0$ corresponds to the certainty-equivalent solution of \ref{['prob:indirect']} and \ref{['prob:equiV']}.

Theorems & Definitions (5)

  • Remark 1
  • Lemma 1
  • proof
  • Remark 2
  • Remark 3