Table of Contents
Fetching ...

Koopman Spectrum Nonlinear Regulators and Efficient Online Learning

Motoya Ohnishi, Isao Ishikawa, Kendall Lowrey, Masahiro Ikeda, Sham Kakade, Yoshinobu Kawahara

TL;DR

This work introduces Koopman Spectrum Nonlinear Regulators (KSNR), a nonlinear control paradigm that minimizes a Koopman operator spectrum cost $\Lambda[\mathscr{K}(\Theta)]$ together with a traditional cumulative cost $J^{\Theta}(X_0;c)$ to shape global dynamical properties. It leverages a random dynamical system model and Koopman operator theory to enable complex behaviors such as limit cycles, stable loops, and smooth movements, extending classical eigenstructure/pole assignments to nonlinear decision making. The authors present KS-LC3, an information-theoretic online learning algorithm with a sublinear regret bound under structural assumptions, and demonstrate KSNR through simulations including imitational target dynamics, Cartpole stability, and Walker smoothness. The approach offers a new dimension of control regularization that can improve energy efficiency, predictability, and stability of learned policies, with implications for robotic control and autonomous systems, while highlighting the need for careful consideration of model misspecification and practical scalability.

Abstract

Most modern reinforcement learning algorithms optimize a cumulative single-step cost along a trajectory. The optimized motions are often 'unnatural', representing, for example, behaviors with sudden accelerations that waste energy and lack predictability. In this work, we present a novel paradigm of controlling nonlinear systems via the minimization of the Koopman spectrum cost: a cost over the Koopman operator of the controlled dynamics. This induces a broader class of dynamical behaviors that evolve over stable manifolds such as nonlinear oscillators, closed loops, and smooth movements. We demonstrate that some dynamics characterizations that are not possible with a cumulative cost are feasible in this paradigm, which generalizes the classical eigenstructure and pole assignments to nonlinear decision making. Moreover, we present a sample efficient online learning algorithm for our problem that enjoys a sub-linear regret bound under some structural assumptions.

Koopman Spectrum Nonlinear Regulators and Efficient Online Learning

TL;DR

This work introduces Koopman Spectrum Nonlinear Regulators (KSNR), a nonlinear control paradigm that minimizes a Koopman operator spectrum cost together with a traditional cumulative cost to shape global dynamical properties. It leverages a random dynamical system model and Koopman operator theory to enable complex behaviors such as limit cycles, stable loops, and smooth movements, extending classical eigenstructure/pole assignments to nonlinear decision making. The authors present KS-LC3, an information-theoretic online learning algorithm with a sublinear regret bound under structural assumptions, and demonstrate KSNR through simulations including imitational target dynamics, Cartpole stability, and Walker smoothness. The approach offers a new dimension of control regularization that can improve energy efficiency, predictability, and stability of learned policies, with implications for robotic control and autonomous systems, while highlighting the need for careful consideration of model misspecification and practical scalability.

Abstract

Most modern reinforcement learning algorithms optimize a cumulative single-step cost along a trajectory. The optimized motions are often 'unnatural', representing, for example, behaviors with sudden accelerations that waste energy and lack predictability. In this work, we present a novel paradigm of controlling nonlinear systems via the minimization of the Koopman spectrum cost: a cost over the Koopman operator of the controlled dynamics. This induces a broader class of dynamical behaviors that evolve over stable manifolds such as nonlinear oscillators, closed loops, and smooth movements. We demonstrate that some dynamics characterizations that are not possible with a cumulative cost are feasible in this paradigm, which generalizes the classical eigenstructure and pole assignments to nonlinear decision making. Moreover, we present a sample efficient online learning algorithm for our problem that enjoys a sub-linear regret bound under some structural assumptions.

Paper Structure

This paper contains 45 sections, 11 theorems, 87 equations, 21 figures, 9 tables, 2 algorithms.

Key Result

Lemma 4.3

Suppose Assumption assump:RKHS holds. Then, there exists a linear operator $M^\star:\mathcal{H}\rightarrow\mathcal{H}'$ such that

Figures (21)

  • Figure 1: Left: Random dynamical system consists of a model of the noise and a function representing the physical phase space (the illustration is inspired by arnold1995randomghil2008climate). The RDS flows over sample space and phase space for each realization $\omega$ and for initial state $x_0$. Right: By lifting the state space to a space of observables, a nonlinear dynamical system over the state space is represented by the linear operator in a lifted space.
  • Figure 2: Comparisons of several costs for decision making problems. The Koopman spectrum cost is the cost over the global properties of the dynamical system itself which is typically unknown for learning problems, and is unobservable.
  • Figure 3: While single-step costs (taking the current and next states as input) could be used to specify every transition, acting as a "local" cost, the Koopman spectrum cost regularizes "global" characteristics of the dynamics through specifying its spectral properties (e.g., by forcing the dynamics to have some given mode $\mathbf{m}^*$ as its top mode). The regularization incurred by the Koopman spectrum cost may not be implemented by the cumulative cost formulation in a straightforward manner. We mention it has some relations to the skill learning with motor primitives (see Section \ref{['sec:relatedwork']}) in the sense that both aim at regulating the global dynamical properties.
  • Figure 4: Left: We minimize solely for Koopman spectrum cost $\Lambda(\mathscr{A})=\|\mathbf{m}-\mathbf{m}^\star\|_1$ to imitate the top mode of a reference spectrum to recover a desired limit cycle behavior for the single-integrator system. Right: By regularizing the spectral radius of Cartpole with a cumulative cost that favors high velocity, the cartpole performs a stable oscillation rather than moving off to infinity.
  • Figure 5: The joint angle trajectories generated by a combination of linear and RFF policies. Left: when only cumulative reward is maximized. Right: when both the cumulative cost and the spectrum cost $\Lambda(\mathscr{A})=5\sum_{i=1}^{d_\phi}|\lambda_i(\mathscr{A})|$ are used, where the factor $5$ is multiplied to balance between the spectrum cost and the cumulative cost.
  • ...and 16 more figures

Theorems & Definitions (35)

  • Definition 3.1: Koopman operator
  • Remark 3.1: Choice of $\mathcal{H}$ and existence of $\mathscr{K}$
  • Example 3.1: Examples of $\Lambda$
  • Remark 3.2: Remarks on how the choice of $\mathcal{H}$ affects the Koopman spectrum cost
  • Remark 3.3: Remarks on the stability regularization
  • Remark 4.1: On Assumption \ref{['asm1']}
  • Remark 4.2: On Assumption \ref{['assump:RKHS']}
  • Lemma 4.3
  • Remark 4.4: On Assumption \ref{['assump:const']}
  • Remark 4.5: Hilbert-Schmidt operators
  • ...and 25 more