Table of Contents
Fetching ...

A Moreau Envelope Approach for LQR Meta-Policy Estimation

Ashwin Aravind, Mohammad Taha Toghani, César A. Uribe

TL;DR

A Moreau Envelope-based surrogate LQR cost is proposed, built from a finite set of realizations of the uncertain system, to define a meta-policy efficiently adjustable to new realizations, and an algorithm to find an approximate first-order stationary point of the meta-LQR cost function is designed.

Abstract

We study the problem of policy estimation for the Linear Quadratic Regulator (LQR) in discrete-time linear time-invariant uncertain dynamical systems. We propose a Moreau Envelope-based surrogate LQR cost, built from a finite set of realizations of the uncertain system, to define a meta-policy efficiently adjustable to new realizations. Moreover, we design an algorithm to find an approximate first-order stationary point of the meta-LQR cost function. Numerical results show that the proposed approach outperforms naive averaging of controllers on new realizations of the linear system. We also provide empirical evidence that our method has better sample complexity than Model-Agnostic Meta-Learning (MAML) approaches.

A Moreau Envelope Approach for LQR Meta-Policy Estimation

TL;DR

A Moreau Envelope-based surrogate LQR cost is proposed, built from a finite set of realizations of the uncertain system, to define a meta-policy efficiently adjustable to new realizations, and an algorithm to find an approximate first-order stationary point of the meta-LQR cost function is designed.

Abstract

We study the problem of policy estimation for the Linear Quadratic Regulator (LQR) in discrete-time linear time-invariant uncertain dynamical systems. We propose a Moreau Envelope-based surrogate LQR cost, built from a finite set of realizations of the uncertain system, to define a meta-policy efficiently adjustable to new realizations. Moreover, we design an algorithm to find an approximate first-order stationary point of the meta-LQR cost function. Numerical results show that the proposed approach outperforms naive averaging of controllers on new realizations of the linear system. We also provide empirical evidence that our method has better sample complexity than Model-Agnostic Meta-Learning (MAML) approaches.
Paper Structure (9 sections, 9 theorems, 33 equations, 2 figures, 1 algorithm)

This paper contains 9 sections, 9 theorems, 33 equations, 2 figures, 1 algorithm.

Key Result

Lemma 6

(Gradient dominance of Moreau Envelope cost): Consider a policy $\Breve{ K }\in\mathbb{R}^{m\times n}_{}$, then

Figures (2)

  • Figure 1: Depiction of properties of Algorithm \ref{['algo:dis_com']}: (a) Convergence of the MEMLQR algorithm: evolution of the Moreau envelope regularized cost $C_\lambda(K^s)$ as the number of outer iterations $(s)$ increase. (b) The evolution of accuracy $\mathopen{}\mathclose{\left(1-{|C_{z}(K^N)-C_{z}(K^n)|}/{C_{z}(K^N)}\right)$ of the policy generated for an unseen realization $z$ by a model-based policy gradient framework when initialized using a policy obtained by minimizing the total LQR cost $C(\cdot)$ and by using the $K^S$ generated by the MEMLQR algorithm. Note that the cost calculation here is over 50 randomly initial states. (c) State and input trajectories of the unseen realization $z$ for the estimate of the optimal policy $(K^N)$ generated after $N=250$ iterations of the model-based policy gradient algorithm, which was initialized at $K^S$ generated by the MEMLQR algorithm.
  • Figure 2: Convergence to the optimal policy after initialization using the policy generated by MAML-LQR and MEMLQR ($\lambda = 0.02,0.2,2$) for three random system realizations in a model-free setting. It can be observed that the cost incurred by policy generated by the MEMLQR approach is closer to the optimal value initially, thus aiding in faster convergence. Also, it may be noted that for higher values of $\lambda$, the cost incurred is lower.

Theorems & Definitions (18)

  • Remark 1
  • Remark 2
  • Lemma 6
  • proof
  • Lemma 7
  • proof
  • Lemma 8
  • Lemma 9
  • Proposition 10
  • proof
  • ...and 8 more