Multi-Objective Learning Model Predictive Control

Siddharth H. Nair; Charlott Vallon; Francesco Borrelli

Multi-Objective Learning Model Predictive Control

Siddharth H. Nair, Charlott Vallon, Francesco Borrelli

TL;DR

This paper ensures that closed-loop control performance improves between successive iterations with respect to each objective, and provides proofs of recursive feasibility and performance improvement, and shows that the converged policy is Pareto optimal.

Abstract

Multi-Objective Learning Model Predictive Control is a novel data-driven control scheme which improves a linear system's closed-loop performance with respect to several convex control objectives over iterations of a repeated task. At each task iteration, collected system data is used to construct terminal components of a Model Predictive Controller. The formulation presented in this paper ensures that closed-loop control performance improves between successive iterations with respect to each objective. We provide proofs of recursive feasibility and performance improvement, and show that the converged policy is Pareto optimal. Simulation results demonstrate the applicability of the proposed approach.

Multi-Objective Learning Model Predictive Control

TL;DR

Abstract

Paper Structure (8 sections, 4 theorems, 53 equations, 3 figures)

This paper contains 8 sections, 4 theorems, 53 equations, 3 figures.

Introduction
Problem Formulation
Learning Model Predictive Control (LMPC)
Multi-Objective Optimization
Multi-Objective LMPC
Properties
Numerical Example
Conclusion

Key Result

theorem thmcountertheorem

Consider system eq:sysdyn_student controlled by the MO-LMPC controller eq:OP_LMPC - eq:controlapplication. Let $\mathcal{CS}^j$ be the sampled safe set at iteration $j$ as defined in eq:CS_def. Let Assumption assmp:cs0 hold. Then, the MO-LMPC eq:OP_LMPC - eq:controlapplication is feasible for all ti

Figures (3)

Figure 1: The left plot depicts two possible values $\boldsymbol{\lambda} \in \Lambda^2(\bar{x})$ for a particular $\bar{x}$. Each $\boldsymbol{\lambda}$ corresponds to a particular convex combination of states from previous trajectories (here, segments of $\mathbf{x}^0$ and $\mathbf{x}^1$ are shown). The right plot depicts how $\hat{V}^{2,\star}_1(\bar{x},\boldsymbol{\lambda})$ is interpolated for a particular choice of $\boldsymbol{\lambda} \in \Lambda^2(\bar{x})$.
Figure 2: $X-Y$ and SOC% trajectories of the system. Notice that the agent is stabilized sooner while consuming less SOC% in iteration $j=26$, compared to iteration $j=0$.
Figure 3: Trajectory costs \ref{['eq:stabcost_eg']} and SOC% consumed \ref{['eq:soccost_eg']} across iterations, depicting iterative improvement in both objectives, and the effect of $\alpha$ on converged solutions.

Theorems & Definitions (11)

definition thmcounterdefinition
remark thmcounterremark
remark thmcounterremark
theorem thmcountertheorem: Feasibility and Stability
proof
theorem thmcountertheorem: Performance Improvement
proof
lemma thmcounterlemma
proof
theorem thmcountertheorem: Pareto Optimality
...and 1 more

Multi-Objective Learning Model Predictive Control

TL;DR

Abstract

Multi-Objective Learning Model Predictive Control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (11)