Table of Contents
Fetching ...

Simultaneous System Identification and Model Predictive Control with No Dynamic Regret

Hongyu Zhou, Vasileios Tzoumas

TL;DR

The paper tackles control under unknown nonlinear dynamics and disturbances by proposing a Simultaneous System Identification and Model Predictive Control (SSI-MPC) framework that learns disturbances online in a Reproducing Kernel Hilbert Space and uses Model Predictive Control on the current learned model. It leverages Random Fourier Features to obtain a finite-dimensional, real-time disturbance model updated via Online Gradient Descent, yielding a finite-time near-optimal performance and asymptotic convergence to a clairvoyant non-causal controller. A sublinear dynamic regret guarantee is established, with an explicit bound of $Regret_T^D = O(T^{3/4})$ under mild Lipschitz and stability assumptions, extended to noisy systems. The approach is validated in physics-based simulations (cart-pole and quadrotor) and hardware experiments, showing superior tracking and robustness to unmodeled disturbances compared with Nominal-MPC, NS-MPC, GP-MPC, and L1-MPC, including scenarios with wind, ground effects, and aerodynamic drag. Overall, the work demonstrates a practical, self-supervised online learning approach that integrates disturbance modeling with MPC to achieve reliable, high-performance control in uncertain environments.$Regret_T^D = O(T^{3/4})$ and asymptotic optimality.

Abstract

We provide an algorithm for the simultaneous system identification and model predictive control of nonlinear systems. The algorithm has finite-time near-optimality guarantees and asymptotically converges to the optimal (non-causal) controller. Particularly, the algorithm enjoys sublinear dynamic regret, defined herein as the suboptimality against an optimal clairvoyant controller that knows how the unknown disturbances and system dynamics will adapt to its actions. The algorithm is self-supervised and applies to control-affine systems with unknown dynamics and disturbances that can be expressed in reproducing kernel Hilbert spaces. Such spaces can model external disturbances and modeling errors that can even be adaptive to the system's state and control input. For example, they can model wind and wave disturbances to aerial and marine vehicles, or inaccurate model parameters such as inertia of mechanical systems. The algorithm first generates random Fourier features that are used to approximate the unknown dynamics or disturbances. Then, it employs model predictive control based on the current learned model of the unknown dynamics (or disturbances). The model of the unknown dynamics is updated online using least squares based on the data collected while controlling the system. We validate our algorithm in both hardware experiments and physics-based simulations. The simulations include (i) a cart-pole aiming to maintain the pole upright despite inaccurate model parameters, and (ii) a quadrotor aiming to track reference trajectories despite unmodeled aerodynamic drag effects. The hardware experiments include a quadrotor aiming to track a circular trajectory despite unmodeled aerodynamic drag effects, ground effects, and wind disturbances.

Simultaneous System Identification and Model Predictive Control with No Dynamic Regret

TL;DR

The paper tackles control under unknown nonlinear dynamics and disturbances by proposing a Simultaneous System Identification and Model Predictive Control (SSI-MPC) framework that learns disturbances online in a Reproducing Kernel Hilbert Space and uses Model Predictive Control on the current learned model. It leverages Random Fourier Features to obtain a finite-dimensional, real-time disturbance model updated via Online Gradient Descent, yielding a finite-time near-optimal performance and asymptotic convergence to a clairvoyant non-causal controller. A sublinear dynamic regret guarantee is established, with an explicit bound of under mild Lipschitz and stability assumptions, extended to noisy systems. The approach is validated in physics-based simulations (cart-pole and quadrotor) and hardware experiments, showing superior tracking and robustness to unmodeled disturbances compared with Nominal-MPC, NS-MPC, GP-MPC, and L1-MPC, including scenarios with wind, ground effects, and aerodynamic drag. Overall, the work demonstrates a practical, self-supervised online learning approach that integrates disturbance modeling with MPC to achieve reliable, high-performance control in uncertain environments. and asymptotic optimality.

Abstract

We provide an algorithm for the simultaneous system identification and model predictive control of nonlinear systems. The algorithm has finite-time near-optimality guarantees and asymptotically converges to the optimal (non-causal) controller. Particularly, the algorithm enjoys sublinear dynamic regret, defined herein as the suboptimality against an optimal clairvoyant controller that knows how the unknown disturbances and system dynamics will adapt to its actions. The algorithm is self-supervised and applies to control-affine systems with unknown dynamics and disturbances that can be expressed in reproducing kernel Hilbert spaces. Such spaces can model external disturbances and modeling errors that can even be adaptive to the system's state and control input. For example, they can model wind and wave disturbances to aerial and marine vehicles, or inaccurate model parameters such as inertia of mechanical systems. The algorithm first generates random Fourier features that are used to approximate the unknown dynamics or disturbances. Then, it employs model predictive control based on the current learned model of the unknown dynamics (or disturbances). The model of the unknown dynamics is updated online using least squares based on the data collected while controlling the system. We validate our algorithm in both hardware experiments and physics-based simulations. The simulations include (i) a cart-pole aiming to maintain the pole upright despite inaccurate model parameters, and (ii) a quadrotor aiming to track reference trajectories despite unmodeled aerodynamic drag effects. The hardware experiments include a quadrotor aiming to track a circular trajectory despite unmodeled aerodynamic drag effects, ground effects, and wind disturbances.
Paper Structure (29 sections, 9 theorems, 47 equations, 12 figures, 2 tables, 1 algorithm)

This paper contains 29 sections, 9 theorems, 47 equations, 12 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

Assume $h \in$$\mathcal{F}_{2}\left(B_{h}\right)$. Let $\delta \in(0,1)$ and $\mu=\frac{\delta}{2 M}$. With probability at least $1-\delta$, there exist $\left\{\alpha_{i}\right\}_{i=1}^{M} \in {\cal D}$ such that where $B_{{\cal Z}} \triangleq \sup_{z\in{\cal Z}} \|z\|$.

Figures (12)

  • Figure 1: Overview of Simultaneous System Identification and Model Predictive Control Pipeline. The pipeline is composed of two interacting modules: (i) a model predictive control (MPC) module, and (ii) an online system identification module. The MPC module uses the estimated unknown disturbances/dynamics from the system identification module to calculate the next control input. Given the control input and the observed new state, the online system identification module then updates the estimate of the unknown disturbances/dynamics.
  • Figure 2: Simulation Results of the Cart-Pole Stabilization Experiment in \ref{['subsec:sim-1']}. (a) and (b) demonstrate that \ref{['alg:MPC']} achieves stabilization in the least time among all tested algorithms. GP-MPC comes second but it incurs a larger deviation from the stabilization goal $(0,0,0,0)$ than \ref{['alg:MPC']}. NS-MPC and Nominal MPC have similar performance, showing that the state-of-the-art non-stochastic control methods are insufficient when the unknown disturbance is adaptive. (c) shows that as \ref{['alg:MPC']} collects more data, the prediction error decreases.
  • Figure 3: Simulation Results for the Sensitivity Analysis in \ref{['subsec:sim-2']} over the Cart-Pole System. The results suggest that large $M$ and $\eta$ achieves better performance. However, $M$ cannot be arbitrarily large as it increases the computational complexity of solving \ref{['eq:mpc_ada_def']}, shown in \ref{['table:cartpole-sensitivity']}.
  • Figure 4: Reference Trajectory for the Quadrotor Experiments in \ref{['subsec:sim-3']}. The blue lines represent the reference trajectories in $3D$. The gray lines are the projection of reference trajectories onto the ground.
  • Figure 5: Tracking Performance Comparison for the Quadrotor Experiments in \ref{['subsec:sim-3']}.\ref{['alg:MPC']} demonstrates improved performance compared to Nominal MPC and GP-MPC in terms of tracking error over all tested reference trajectories and maximal speeds. \ref{['alg:MPC']} with INDI achieves the best performance as INDI provides better tracking in attitude dynamics.
  • ...and 7 more figures

Theorems & Definitions (19)

  • Remark 1: Extension to Systems with Unmodeled Noise
  • Definition 1: Value Function grimm2005model
  • Remark 2: Discussion on \ref{['assumption:stability']}
  • Definition 2: Dynamic Regret
  • Remark 3: Adaptivity of $h$
  • Proposition 1: Uniformly Approximation Error boffi2022nonparametric
  • Remark 4: Beyond Random Fourier Features
  • Proposition 2: Regret Bound of Online Least-Squares Estimation hazan2016introduction
  • Remark 5: Time-Varying Optimal Parameters
  • Theorem 1: No-Regret
  • ...and 9 more