Optimal Control of Nonlinear Systems with Unknown Dynamics
Wenjian Hao, Paulo C. Heredia, Shaoshuai Mou
TL;DR
The paper tackles optimal control for systems with unknown nonlinear dynamics by marrying a Deep Koopman operator (DKO) lift with an actor–critic policy gradient framework, yielding a data-driven method to synthesize a closed-loop controller without explicit model knowledge. The proposed PGDK method jointly learns lifted dynamics, a TD-based critic, and a policy, enabling gradient-based optimization of $\boldsymbol{\theta}^{\mu}$ using one-step predictions and data from $\mathcal{D}$. The authors provide convergence analyses under Robbins–Monro step sizes and timescale separation, showing global optimality in convex settings and local convergence otherwise, with robust behavior under gradient approximation errors. Empirical results on an LTI system and a nonlinear inverted pendulum demonstrate improved data efficiency and performance close to model-based baselines such as LQR and MPC, and show that online PGDK can outperform some off-policy RL baselines in sample efficiency.
Abstract
This paper presents a data-driven method to find a closed-loop optimal controller, which minimizes a specified infinite-horizon cost function for systems with unknown dynamics. Suppose the closed-loop optimal controller can be parameterized by a given class of functions, hereafter referred to as the policy. The proposed method introduces a novel gradient estimation framework, which approximates the gradient of the cost function with respect to the policy parameters via integrating the Koopman operator with the classical concept of actor-critic. This enables the policy parameters to be tuned iteratively using gradient descent to achieve an optimal controller, leveraging the linearity of the Koopman operator. The convergence analysis of the proposed framework is provided. The control performance of the proposed method is evaluated through simulations compared with classical optimal control methods that usually assume the dynamics are known.
