Safe and Stable Closed-Loop Learning for Neural-Network-Supported Model Predictive Control

Sebastian Hirt; Maik Pfefferkorn; Rolf Findeisen

Safe and Stable Closed-Loop Learning for Neural-Network-Supported Model Predictive Control

Sebastian Hirt, Maik Pfefferkorn, Rolf Findeisen

TL;DR

This work tackles safe learning of parametrized model predictive controllers (MPC) when the system model is imperfect. It introduces a neural network to parameterize the MPC stage cost and uses multi-episode Bayesian optimization with probabilistic safety guarantees to optimize long-term closed-loop performance, while enforcing stability through a safety constraint $G_1(\theta)$. Gaussian-process surrogates model both the objective $G_0(\theta)$ and safety constraints $G_i(\theta)$, with safe initialization and acquisition-function-based exploration that respects safety. A simulation study on a double pendulum demonstrates that the learned neural cost improves convergence and reduces oscillations, while preserving probabilistic stability during learning. The approach offers a data-efficient, safety-aware path to enhancing MPC performance in the presence of model-plant mismatch, and points to extensions with richer parameterizations and RL/BO hybrids for broader applicability.

Abstract

Safe learning of control policies remains challenging, both in optimal control and reinforcement learning. In this article, we consider safe learning of parametrized predictive controllers that operate with incomplete information about the underlying process. To this end, we employ Bayesian optimization for learning the best parameters from closed-loop data. Our method focuses on the system's overall long-term performance in closed-loop while keeping it safe and stable. Specifically, we parametrize the stage cost function of an MPC using a feedforward neural network. This allows for a high degree of flexibility, enabling the system to achieve a better closed-loop performance with respect to a superordinate measure. However, this flexibility also necessitates safety measures, especially with respect to closed-loop stability. To this end, we explicitly incorporated stability information in the Bayesian-optimization-based learning procedure, thereby achieving rigorous probabilistic safety guarantees. The proposed approach is illustrated using a numeric example.

Safe and Stable Closed-Loop Learning for Neural-Network-Supported Model Predictive Control

TL;DR

. Gaussian-process surrogates model both the objective

and safety constraints

, with safe initialization and acquisition-function-based exploration that respects safety. A simulation study on a double pendulum demonstrates that the learned neural cost improves convergence and reduces oscillations, while preserving probabilistic stability during learning. The approach offers a data-efficient, safety-aware path to enhancing MPC performance in the presence of model-plant mismatch, and points to extensions with richer parameterizations and RL/BO hybrids for broader applicability.

Abstract

Paper Structure (14 sections, 3 theorems, 17 equations, 2 figures, 1 algorithm)

This paper contains 14 sections, 3 theorems, 17 equations, 2 figures, 1 algorithm.

Introduction
Fundamentals
Problem Formulation
Parametrized Model Predictive Control
Stability of Dynamical Systems
Gaussian Process Surrogate Models
Safe Bayesian Optimization
Safe Learning of a Neural Cost Function
Neural Cost Function
Probabilistic Stability Guarantees During Learning
Simulation Study
Set-Up
Safe Stability-informed Cost Function Learning
Conclusions

Key Result

Lemma 1

(chowdhury2017kernelized) Let the covariance function $k$ be continuous and positive definite and assume that $\varphi \in \mathcal{H}_k$ is contained in the reproducing kernel Hilbert space (RKHS) $\mathcal{H}_k$ associated with $k$. Assume further that the RKHS norm of $\varphi$ is bounded from ab

Figures (2)

Figure 1: States $\psi_1$ (top) and $\psi_2$ (bottom) during the learning procedure for $\beta = 2$. We show the learned result (blue), the initial safe closed-loop run without a neural network stage cost (orange), and all intermediate closed-loop samples resulting from the BO procedure (gray).
Figure 2: State norms (blue) for all closed-loop runs and constraint resulting from the stability condition $\max \{ \zeta(\lVert x_0-x_d \rVert, k), \nu \}$ (red). Results are shown for confidence parameters $\beta = 0.5$ (top) and $\beta = 2$ (bottom).

Theorems & Definitions (7)

Definition 1
Definition 2
Definition 3
Lemma 1
Lemma 2
Theorem 1
proof

Safe and Stable Closed-Loop Learning for Neural-Network-Supported Model Predictive Control

TL;DR

Abstract

Safe and Stable Closed-Loop Learning for Neural-Network-Supported Model Predictive Control

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (7)