Koopman Spectrum Nonlinear Regulators and Efficient Online Learning
Motoya Ohnishi, Isao Ishikawa, Kendall Lowrey, Masahiro Ikeda, Sham Kakade, Yoshinobu Kawahara
TL;DR
This work introduces Koopman Spectrum Nonlinear Regulators (KSNR), a nonlinear control paradigm that minimizes a Koopman operator spectrum cost $\Lambda[\mathscr{K}(\Theta)]$ together with a traditional cumulative cost $J^{\Theta}(X_0;c)$ to shape global dynamical properties. It leverages a random dynamical system model and Koopman operator theory to enable complex behaviors such as limit cycles, stable loops, and smooth movements, extending classical eigenstructure/pole assignments to nonlinear decision making. The authors present KS-LC3, an information-theoretic online learning algorithm with a sublinear regret bound under structural assumptions, and demonstrate KSNR through simulations including imitational target dynamics, Cartpole stability, and Walker smoothness. The approach offers a new dimension of control regularization that can improve energy efficiency, predictability, and stability of learned policies, with implications for robotic control and autonomous systems, while highlighting the need for careful consideration of model misspecification and practical scalability.
Abstract
Most modern reinforcement learning algorithms optimize a cumulative single-step cost along a trajectory. The optimized motions are often 'unnatural', representing, for example, behaviors with sudden accelerations that waste energy and lack predictability. In this work, we present a novel paradigm of controlling nonlinear systems via the minimization of the Koopman spectrum cost: a cost over the Koopman operator of the controlled dynamics. This induces a broader class of dynamical behaviors that evolve over stable manifolds such as nonlinear oscillators, closed loops, and smooth movements. We demonstrate that some dynamics characterizations that are not possible with a cumulative cost are feasible in this paradigm, which generalizes the classical eigenstructure and pole assignments to nonlinear decision making. Moreover, we present a sample efficient online learning algorithm for our problem that enjoys a sub-linear regret bound under some structural assumptions.
