End-to-End Reinforcement Learning of Koopman Models for Economic Nonlinear Model Predictive Control
Daniel Mayfrank, Alexander Mitsos, Manuel Dahmen
TL;DR
The paper tackles the challenge of achieving accurate yet fast control for economic nonlinear model predictive control (eNMPC) by learning task-optimized Koopman surrogate models in an end-to-end reinforcement learning (RL) framework. It represents nonlinear dynamics in a lifted linear space with $z_0= \psi_{\theta}(x_0)$, $z_{t+1}= A_{\theta} z_t + B_{\theta} u_t$, and $\hat{x}_t= C_{\theta} z_t$, which yields convex OCPs solvable in real time and differentiable for RL updates via cvxpylayers. Through two CSTR-based case studies, the authors show that end-to-end learned Koopman models outperform system-identification-trained models and that the resulting eNMPC controllers can adapt to control-setting changes without retraining, unlike model-free RL policies. The work highlights the practical potential of combining Koopman embeddings, differentiable MPC, and policy-optimization techniques to produce robust, computation-efficient economic controllers for nonlinear processes, and points to future work on scaling to larger systems and integrating with model-based RL components.
Abstract
(Economic) nonlinear model predictive control ((e)NMPC) requires dynamic models that are sufficiently accurate and computationally tractable. Data-driven surrogate models for mechanistic models can reduce the computational burden of (e)NMPC; however, such models are typically trained by system identification for maximum prediction accuracy on simulation samples and perform suboptimally in (e)NMPC. We present a method for end-to-end reinforcement learning of Koopman surrogate models for optimal performance as part of (e)NMPC. We apply our method to two applications derived from an established nonlinear continuous stirred-tank reactor model. The controller performance is compared to that of (e)NMPCs utilizing models trained using system identification, and model-free neural network controllers trained using reinforcement learning. We show that the end-to-end trained models outperform those trained using system identification in (e)NMPC, and that, in contrast to the neural network controllers, the (e)NMPC controllers can react to changes in the control setting without retraining.
