MPC-Inspired Reinforcement Learning for Verifiable Model-Free Control

Yiwen Lu; Zishuo Li; Yihan Zhou; Na Li; Yilin Mo

MPC-Inspired Reinforcement Learning for Verifiable Model-Free Control

Yiwen Lu, Zishuo Li, Yihan Zhou, Na Li, Yilin Mo

TL;DR

This approach addresses the limitations of common controllers with Multi-Layer Perceptron (MLP) or other general neural network architecture used in DRL, in terms of verifiability and performance guarantees, and the learned controllers possess verifiable properties akin to MPC.

Abstract

In this paper, we introduce a new class of parameterized controllers, drawing inspiration from Model Predictive Control (MPC). The controller resembles a Quadratic Programming (QP) solver of a linear MPC problem, with the parameters of the controller being trained via Deep Reinforcement Learning (DRL) rather than derived from system models. This approach addresses the limitations of common controllers with Multi-Layer Perceptron (MLP) or other general neural network architecture used in DRL, in terms of verifiability and performance guarantees, and the learned controllers possess verifiable properties like persistent feasibility and asymptotic stability akin to MPC. On the other hand, numerical examples illustrate that the proposed controller empirically matches MPC and MLP controllers in terms of control performance and has superior robustness against modeling uncertainty and noises. Furthermore, the proposed controller is significantly more computationally efficient compared to MPC and requires fewer parameters to learn than MLP controllers. Real-world experiments on vehicle drift maneuvering task demonstrate the potential of these controllers for robotics and other demanding control tasks.

MPC-Inspired Reinforcement Learning for Verifiable Model-Free Control

TL;DR

Abstract

Paper Structure (27 sections, 4 theorems, 54 equations, 7 figures, 5 tables)

This paper contains 27 sections, 4 theorems, 54 equations, 7 figures, 5 tables.

Introduction
Problem Formulation and Preliminaries
Problem Formulation
Linear MPC and its QP Representation
Algorithm for Solving QPs
Learning Model-Free QP Controllers
Performance Guarantees of Learned QP Controller
Benchmarking Results
Results on Nominal Systems
Validation of Robustness
Application Example on a Real-World System: Vehicle Drift Maneuvering
Conclusion
PDHG iterations and convergence guarantee
Ensuring the Feasibility of the Learned QP Problem
Intuition Behind Learned QP Controller: a Comparison with MPC
...and 12 more sections

Key Result

Theorem 1

If $0 < \alpha < 1$ and the problem eq:qp is feasible, then the iterations eq:pdhg_iter yields $y^i \to y^*$, where $y^i$ is in eq:get_sol and $y^*$ is the optimal solution of the original problem. Furthermore, the suboptimality gap satisfies: where $p^i, p^*$ are the primal value at iteration $i$ and the optimal primal value respectively, and $r_{prim}^i, r_{dual}^i$ are the primal and dual resi

Figures (7)

Figure 1: Proposed control policy architecture. The controller solves a QP problem in form \ref{['eq:qp']}, whose parameters $P, H$ are shared across all initial state and reference $(x_0, r)$, while $q, b$ depend affinely on $(x_0, r)$ with weights $W_q, W_b$ and bias $b_b$ (see \ref{['eq:qb_affine']}). An approximate solution to the QP problem, $y^{n_{iter}}$, is obtained by running a $n_{iter}$ QP solver iterations \ref{['eq:pdhg_iter']} followed by a transform \ref{['eq:get_sol']}, whose first $m_{sys}$ dimensions are used as the current control input $u_0$.
Figure 2: Performance comparison on quadruple tank system with process noise and parametric uncertainties.
Figure 3: Result of deploying learned QP controller to the vehicle drift maneuvering task. Video available at: https://youtu.be/-XYtl2b4OVc.
Figure 4: Comparison of trajectories under different controllers on the double integrator example, starting from the common initial state $x_0 = (-4, 2.1)$. "MPC" stands for the truncated MPC, "MPC-T" stands for MPC with a manually crafted terminal cost, and "LQP" stands for the learned QP controller. Solid dots represent realized trajectories, while asterisks stand for predicted trajectories. The green shadow stands for the maximal control invariant set borrelli2017predictive, i.e., the largest set over which one can expect any controller to work. The thick red lines stand for the bounds on the state.
Figure 5: Comparison of boundaries of the Maximal Control Invariant (MCI) set and verified Region Of Attraction (ROA) for the double integrator example.
...and 2 more figures

Theorems & Definitions (5)

Theorem 1
Theorem 2: Certificate for Persistent Feasibility
Theorem 3: Certificate for Asymptotic Stability
Theorem 4
Example 1: Double integrator, variant of borrelli2017predictive

MPC-Inspired Reinforcement Learning for Verifiable Model-Free Control

TL;DR

Abstract

MPC-Inspired Reinforcement Learning for Verifiable Model-Free Control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (5)