Table of Contents
Fetching ...

Meta-Learning Linear Quadratic Regulators: A Policy Gradient MAML Approach for Model-free LQR

Leonardo F. Toso, Donglin Zhan, James Anderson, Han Wang

TL;DR

This work tackles meta-learning for linear-quadratic regulators across heterogeneous tasks by introducing MAML-LQR, a policy-gradient MAML framework applicable in both model-based and model-free settings. It derives explicit stability and convergence guarantees, including linear convergence in the model-based case, and provides probabilistic guarantees for the model-free zeroth-order approach. The analysis centers on task heterogeneity, capturing its effect via bounds on system and cost differences, and demonstrates that the learned initialization enables rapid adaptation to unseen tasks with controllable bias. Empirical results on a Boeing-system-like setup corroborate the theory, showing improved adaptation speed when starting from the MAML-LQR initialization and revealing the influence of heterogeneity on performance and personalization.

Abstract

We investigate the problem of learning linear quadratic regulators (LQR) in a multi-task, heterogeneous, and model-free setting. We characterize the stability and personalization guarantees of a policy gradient-based (PG) model-agnostic meta-learning (MAML) (Finn et al., 2017) approach for the LQR problem under different task-heterogeneity settings. We show that our MAML-LQR algorithm produces a stabilizing controller close to each task-specific optimal controller up to a task-heterogeneity bias in both model-based and model-free learning scenarios. Moreover, in the model-based setting, we show that such a controller is achieved with a linear convergence rate, which improves upon sub-linear rates from existing work. Our theoretical guarantees demonstrate that the learned controller can efficiently adapt to unseen LQR tasks.

Meta-Learning Linear Quadratic Regulators: A Policy Gradient MAML Approach for Model-free LQR

TL;DR

This work tackles meta-learning for linear-quadratic regulators across heterogeneous tasks by introducing MAML-LQR, a policy-gradient MAML framework applicable in both model-based and model-free settings. It derives explicit stability and convergence guarantees, including linear convergence in the model-based case, and provides probabilistic guarantees for the model-free zeroth-order approach. The analysis centers on task heterogeneity, capturing its effect via bounds on system and cost differences, and demonstrates that the learned initialization enables rapid adaptation to unseen tasks with controllable bias. Empirical results on a Boeing-system-like setup corroborate the theory, showing improved adaptation speed when starting from the MAML-LQR initialization and revealing the influence of heterogeneity on performance and personalization.

Abstract

We investigate the problem of learning linear quadratic regulators (LQR) in a multi-task, heterogeneous, and model-free setting. We characterize the stability and personalization guarantees of a policy gradient-based (PG) model-agnostic meta-learning (MAML) (Finn et al., 2017) approach for the LQR problem under different task-heterogeneity settings. We show that our MAML-LQR algorithm produces a stabilizing controller close to each task-specific optimal controller up to a task-heterogeneity bias in both model-based and model-free learning scenarios. Moreover, in the model-based setting, we show that such a controller is achieved with a linear convergence rate, which improves upon sub-linear rates from existing work. Our theoretical guarantees demonstrate that the learned controller can efficiently adapt to unseen LQR tasks.
Paper Structure (18 sections, 13 theorems, 114 equations, 1 figure, 3 algorithms)

This paper contains 18 sections, 13 theorems, 114 equations, 1 figure, 3 algorithms.

Key Result

Lemma 1

(Uniform bounds) Given $\mathcal{T}^{(i)}$ and a stabilizing controller $K \in \mathcal{S}_{\text{ML}}$, the gradient $\nabla J^{(i)}(K)$, Hessian $\nabla^2 J^{(i)}(K)$, and controller $K$ are bounded as follows: where $h_G(K)$, $h_H(K)$, and $h_c(K)$ are functions of the problem parameters.

Figures (1)

  • Figure 1: Cost gap between the learned the task-specific optimal controller with respect to iteration. (left) Convergence of the MAML-LQR. (middle) MAML-LQR, $\bar{\epsilon}_1 = (1.2, 1.1, 1.4, 1.2)\times 10^{-3}$, $\bar{\epsilon}_2 = (1.3, 1.1, 1.4, 1.2)\times 10^{-2}$, $\bar{\epsilon}_3 = (1.7, 1.8, 1.9, 1.7)\times 10^{-2}$. (right) PG-LQRfazel2018global.

Theorems & Definitions (23)

  • Definition 1
  • Definition 2
  • Remark 1
  • Remark 2
  • Lemma 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • Remark 3
  • Theorem 1
  • ...and 13 more