Meta-Learning Linear Quadratic Regulators: A Policy Gradient MAML Approach for Model-free LQR
Leonardo F. Toso, Donglin Zhan, James Anderson, Han Wang
TL;DR
This work tackles meta-learning for linear-quadratic regulators across heterogeneous tasks by introducing MAML-LQR, a policy-gradient MAML framework applicable in both model-based and model-free settings. It derives explicit stability and convergence guarantees, including linear convergence in the model-based case, and provides probabilistic guarantees for the model-free zeroth-order approach. The analysis centers on task heterogeneity, capturing its effect via bounds on system and cost differences, and demonstrates that the learned initialization enables rapid adaptation to unseen tasks with controllable bias. Empirical results on a Boeing-system-like setup corroborate the theory, showing improved adaptation speed when starting from the MAML-LQR initialization and revealing the influence of heterogeneity on performance and personalization.
Abstract
We investigate the problem of learning linear quadratic regulators (LQR) in a multi-task, heterogeneous, and model-free setting. We characterize the stability and personalization guarantees of a policy gradient-based (PG) model-agnostic meta-learning (MAML) (Finn et al., 2017) approach for the LQR problem under different task-heterogeneity settings. We show that our MAML-LQR algorithm produces a stabilizing controller close to each task-specific optimal controller up to a task-heterogeneity bias in both model-based and model-free learning scenarios. Moreover, in the model-based setting, we show that such a controller is achieved with a linear convergence rate, which improves upon sub-linear rates from existing work. Our theoretical guarantees demonstrate that the learned controller can efficiently adapt to unseen LQR tasks.
