Table of Contents
Fetching ...

Data-driven optimal control of unknown nonlinear dynamical systems using the Koopman operator

Zhexuan Zeng, Ruikun Zhou, Yiming Meng, Jun Liu

TL;DR

This work tackles data-driven optimal control for unknown nonlinear systems by marrying a modified Koopman operator framework with model-based reinforcement learning. It relaxes observable-function requirements to better capture nonlinear state-input terms and uses a neural PDE solver-based approach to scale PDE-based value-function computation to high dimensions. The authors establish convergence guarantees for the learned value function and policies and demonstrate strong empirical performance on systems up to 9 states and 4 inputs, achieving accumulated-cost errors between $10^{-5}$ and $10^{-3}$. This framework offers a certifiable, scalable path for identifying dynamics and synthesizing stabilizing controllers directly from data in complex, high-dimensional settings.

Abstract

Nonlinear optimal control is vital for numerous applications but remains challenging for unknown systems due to the difficulties in accurately modelling dynamics and handling computational demands, particularly in high-dimensional settings. This work develops a theoretically certifiable framework that integrates a modified Koopman operator approach with model-based reinforcement learning to address these challenges. By relaxing the requirements on observable functions, our method incorporates nonlinear terms involving both states and control inputs, significantly enhancing system identification accuracy. Moreover, by leveraging the power of neural networks to solve partial differential equations (PDEs), our approach is able to achieving stabilizing control for high-dimensional dynamical systems, up to 9-dimensional. The learned value function and control laws are proven to converge to those of the true system at each iteration. Additionally, the accumulated cost of the learned control closely approximates that of the true system, with errors ranging from $10^{-5}$ to $10^{-3}$.

Data-driven optimal control of unknown nonlinear dynamical systems using the Koopman operator

TL;DR

This work tackles data-driven optimal control for unknown nonlinear systems by marrying a modified Koopman operator framework with model-based reinforcement learning. It relaxes observable-function requirements to better capture nonlinear state-input terms and uses a neural PDE solver-based approach to scale PDE-based value-function computation to high dimensions. The authors establish convergence guarantees for the learned value function and policies and demonstrate strong empirical performance on systems up to 9 states and 4 inputs, achieving accumulated-cost errors between and . This framework offers a certifiable, scalable path for identifying dynamics and synthesizing stabilizing controllers directly from data in complex, high-dimensional settings.

Abstract

Nonlinear optimal control is vital for numerous applications but remains challenging for unknown systems due to the difficulties in accurately modelling dynamics and handling computational demands, particularly in high-dimensional settings. This work develops a theoretically certifiable framework that integrates a modified Koopman operator approach with model-based reinforcement learning to address these challenges. By relaxing the requirements on observable functions, our method incorporates nonlinear terms involving both states and control inputs, significantly enhancing system identification accuracy. Moreover, by leveraging the power of neural networks to solve partial differential equations (PDEs), our approach is able to achieving stabilizing control for high-dimensional dynamical systems, up to 9-dimensional. The learned value function and control laws are proven to converge to those of the true system at each iteration. Additionally, the accumulated cost of the learned control closely approximates that of the true system, with errors ranging from to .

Paper Structure

This paper contains 17 sections, 2 theorems, 46 equations, 1 figure, 3 tables.

Key Result

theorem 1

As $\lambda\to\infty, T_{max}\to\infty, N\to\infty$ simultaneously, we have $F_{\lambda,T_{max},N}\to F$ uniformly on ${\mathcal{M}}$, where ${\mathcal{M}}$ is a compact set in $\mathbb{R}^{n+m}$.

Figures (1)

  • Figure 1: The error between accumulated costs computed by the control learned from identified system and the true system (left). 50 trajectories of all states driven by the control learned from the identified system (right).

Theorems & Definitions (5)

  • definition 1: Admissible control
  • theorem 1: Convergence of vector field
  • proof
  • theorem 2
  • proof