Simultaneous System Identification and Model Predictive Control with No Dynamic Regret
Hongyu Zhou, Vasileios Tzoumas
TL;DR
The paper tackles control under unknown nonlinear dynamics and disturbances by proposing a Simultaneous System Identification and Model Predictive Control (SSI-MPC) framework that learns disturbances online in a Reproducing Kernel Hilbert Space and uses Model Predictive Control on the current learned model. It leverages Random Fourier Features to obtain a finite-dimensional, real-time disturbance model updated via Online Gradient Descent, yielding a finite-time near-optimal performance and asymptotic convergence to a clairvoyant non-causal controller. A sublinear dynamic regret guarantee is established, with an explicit bound of $Regret_T^D = O(T^{3/4})$ under mild Lipschitz and stability assumptions, extended to noisy systems. The approach is validated in physics-based simulations (cart-pole and quadrotor) and hardware experiments, showing superior tracking and robustness to unmodeled disturbances compared with Nominal-MPC, NS-MPC, GP-MPC, and L1-MPC, including scenarios with wind, ground effects, and aerodynamic drag. Overall, the work demonstrates a practical, self-supervised online learning approach that integrates disturbance modeling with MPC to achieve reliable, high-performance control in uncertain environments.$Regret_T^D = O(T^{3/4})$ and asymptotic optimality.
Abstract
We provide an algorithm for the simultaneous system identification and model predictive control of nonlinear systems. The algorithm has finite-time near-optimality guarantees and asymptotically converges to the optimal (non-causal) controller. Particularly, the algorithm enjoys sublinear dynamic regret, defined herein as the suboptimality against an optimal clairvoyant controller that knows how the unknown disturbances and system dynamics will adapt to its actions. The algorithm is self-supervised and applies to control-affine systems with unknown dynamics and disturbances that can be expressed in reproducing kernel Hilbert spaces. Such spaces can model external disturbances and modeling errors that can even be adaptive to the system's state and control input. For example, they can model wind and wave disturbances to aerial and marine vehicles, or inaccurate model parameters such as inertia of mechanical systems. The algorithm first generates random Fourier features that are used to approximate the unknown dynamics or disturbances. Then, it employs model predictive control based on the current learned model of the unknown dynamics (or disturbances). The model of the unknown dynamics is updated online using least squares based on the data collected while controlling the system. We validate our algorithm in both hardware experiments and physics-based simulations. The simulations include (i) a cart-pole aiming to maintain the pole upright despite inaccurate model parameters, and (ii) a quadrotor aiming to track reference trajectories despite unmodeled aerodynamic drag effects. The hardware experiments include a quadrotor aiming to track a circular trajectory despite unmodeled aerodynamic drag effects, ground effects, and wind disturbances.
