Table of Contents
Fetching ...

Stability-informed Bayesian Optimization for MPC Cost Function Learning

Sebastian Hirt, Maik Pfefferkorn, Ali Mesbah, Rolf Findeisen

TL;DR

This work tackles learning an MPC cost function under model-plant mismatch while ensuring closed-loop stability. It introduces a constrained Bayesian optimization framework that tunes a neural-network-based stage cost $l_ heta(x,u)$ and leverages a Lyapunov candidate $J^*(x_k)$ to impose stability constraints as soft BO penalties. Gaussian process surrogates model both the performance objective $G_0( heta)$ and the Lyapunov constraints $G_1( heta)$, $G_2( heta)$, enabling data-efficient, safe exploration. In simulations on a double pendulum, the Lyapunov-informed approach achieves faster convergence to the reference with reduced oscillations and provides a stability certificate, outperforming unconstrained learning. The work outlines future directions for probabilistic stability guarantees and higher-dimensional parameter spaces.

Abstract

Designing predictive controllers towards optimal closed-loop performance while maintaining safety and stability is challenging. This work explores closed-loop learning for predictive control parameters under imperfect information while considering closed-loop stability. We employ constrained Bayesian optimization to learn a model predictive controller's (MPC) cost function parametrized as a feedforward neural network, optimizing closed-loop behavior as well as minimizing model-plant mismatch. Doing so offers a high degree of freedom and, thus, the opportunity for efficient and global optimization towards the desired and optimal closed-loop behavior. We extend this framework by stability constraints on the learned controller parameters, exploiting the optimal value function of the underlying MPC as a Lyapunov candidate. The effectiveness of the proposed approach is underlined in simulations, highlighting its performance and safety capabilities.

Stability-informed Bayesian Optimization for MPC Cost Function Learning

TL;DR

This work tackles learning an MPC cost function under model-plant mismatch while ensuring closed-loop stability. It introduces a constrained Bayesian optimization framework that tunes a neural-network-based stage cost and leverages a Lyapunov candidate to impose stability constraints as soft BO penalties. Gaussian process surrogates model both the performance objective and the Lyapunov constraints , , enabling data-efficient, safe exploration. In simulations on a double pendulum, the Lyapunov-informed approach achieves faster convergence to the reference with reduced oscillations and provides a stability certificate, outperforming unconstrained learning. The work outlines future directions for probabilistic stability guarantees and higher-dimensional parameter spaces.

Abstract

Designing predictive controllers towards optimal closed-loop performance while maintaining safety and stability is challenging. This work explores closed-loop learning for predictive control parameters under imperfect information while considering closed-loop stability. We employ constrained Bayesian optimization to learn a model predictive controller's (MPC) cost function parametrized as a feedforward neural network, optimizing closed-loop behavior as well as minimizing model-plant mismatch. Doing so offers a high degree of freedom and, thus, the opportunity for efficient and global optimization towards the desired and optimal closed-loop behavior. We extend this framework by stability constraints on the learned controller parameters, exploiting the optimal value function of the underlying MPC as a Lyapunov candidate. The effectiveness of the proposed approach is underlined in simulations, highlighting its performance and safety capabilities.
Paper Structure (14 sections, 13 equations, 3 figures)

This paper contains 14 sections, 13 equations, 3 figures.

Figures (3)

  • Figure 1: Nominal closed-loop state trajectory without cost function modification (orange dashed line), learned trajectory (blue solid line) and all other queried trajectories (gray solid lines) without enforcing stability constraints.
  • Figure 2: Nominal trajectory (orange dashed line), learned trajectory (blue solid line) and all other queried trajectories (gray solid lines) while imposing Lyapunov-like stability constraints.
  • Figure 3: Evolution of the optimal value function along the optimal, learned closed-loop trajectory.