Table of Contents
Fetching ...

Koopman-Assisted Reinforcement Learning

Preston Rozwood, Edward Mehrez, Ludger Paehler, Wen Sun, Steven L. Brunton

TL;DR

This paper explores the connection between the data-driven Koopman operator and Markov Decision Processes (MDPs), resulting in the development of two new RL algorithms to address limitations of the Bellman equation.

Abstract

The Bellman equation and its continuous form, the Hamilton-Jacobi-Bellman (HJB) equation, are ubiquitous in reinforcement learning (RL) and control theory. However, these equations quickly become intractable for systems with high-dimensional states and nonlinearity. This paper explores the connection between the data-driven Koopman operator and Markov Decision Processes (MDPs), resulting in the development of two new RL algorithms to address these limitations. We leverage Koopman operator techniques to lift a nonlinear system into new coordinates where the dynamics become approximately linear, and where HJB-based methods are more tractable. In particular, the Koopman operator is able to capture the expectation of the time evolution of the value function of a given system via linear dynamics in the lifted coordinates. By parameterizing the Koopman operator with the control actions, we construct a ``Koopman tensor'' that facilitates the estimation of the optimal value function. Then, a transformation of Bellman's framework in terms of the Koopman tensor enables us to reformulate two max-entropy RL algorithms: soft value iteration and soft actor-critic (SAC). This highly flexible framework can be used for deterministic or stochastic systems as well as for discrete or continuous-time dynamics. Finally, we show that these Koopman Assisted Reinforcement Learning (KARL) algorithms attain state-of-the-art (SOTA) performance with respect to traditional neural network-based SAC and linear quadratic regulator (LQR) baselines on four controlled dynamical systems: a linear state-space system, the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing.

Koopman-Assisted Reinforcement Learning

TL;DR

This paper explores the connection between the data-driven Koopman operator and Markov Decision Processes (MDPs), resulting in the development of two new RL algorithms to address limitations of the Bellman equation.

Abstract

The Bellman equation and its continuous form, the Hamilton-Jacobi-Bellman (HJB) equation, are ubiquitous in reinforcement learning (RL) and control theory. However, these equations quickly become intractable for systems with high-dimensional states and nonlinearity. This paper explores the connection between the data-driven Koopman operator and Markov Decision Processes (MDPs), resulting in the development of two new RL algorithms to address these limitations. We leverage Koopman operator techniques to lift a nonlinear system into new coordinates where the dynamics become approximately linear, and where HJB-based methods are more tractable. In particular, the Koopman operator is able to capture the expectation of the time evolution of the value function of a given system via linear dynamics in the lifted coordinates. By parameterizing the Koopman operator with the control actions, we construct a ``Koopman tensor'' that facilitates the estimation of the optimal value function. Then, a transformation of Bellman's framework in terms of the Koopman tensor enables us to reformulate two max-entropy RL algorithms: soft value iteration and soft actor-critic (SAC). This highly flexible framework can be used for deterministic or stochastic systems as well as for discrete or continuous-time dynamics. Finally, we show that these Koopman Assisted Reinforcement Learning (KARL) algorithms attain state-of-the-art (SOTA) performance with respect to traditional neural network-based SAC and linear quadratic regulator (LQR) baselines on four controlled dynamical systems: a linear state-space system, the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing.
Paper Structure (36 sections, 54 equations, 12 figures, 3 tables, 5 algorithms)

This paper contains 36 sections, 54 equations, 12 figures, 3 tables, 5 algorithms.

Figures (12)

  • Figure 1: Koopman-assisted reinforcement learning in the example of the Soft Actor Koopman-Critic, a Koopman variant of the popular Soft Actor-Critic algorithm. The Koopman Critic receives the state and the reward in the original state-space, before lifting these variables to a feature space, where the value function can be advanced in time with the Koopman operator. This critique is then fed back to the Actor which issues the action to be performed in the environment. This is 1 of 2 main Koopman-assisted reinforcement learning algorithms explored in this work; the other algorithm is a modified soft Koopman value iteration.
  • Figure 2: Soft Koopman Value Iteration, a Koopman variant of the widely used value iteration algorithm. In the Koopman value iteration, The set of states $x_{i}$, under a sequence of actions $\{u_{j} \}_{j=1,\ldots,n}$, are lifted onto the vector space to advance the dynamics with the Koopman operator linearly. The action policy is then learned in this new space.
  • Figure 3: Construction of action-dependent Koopman operators $K^u$ from the Koopman tensor $\mathscr{T}_K$. Colors match along the $k$ index (depth of tensor box). Each of the matrix slices is then weighted according to the $\psi$ dictionary elements to construct the control-dependent Koopman operator $K^u$.
  • Figure 4: Schematic of Koopman with control. (left) A nominal trajectory is shown for a given policy $\pi$, with alternative branches for different control actions. (right) A diagram showing the state and control sequences, as they relate to the policy $\pi$ and dynamics ${F}$.
  • Figure 5: Four benchmark problems investigated: (a) simple linear system; (b) Lorenz 1963 model; (c) incompressible fluid flow past a cylinder at Reynolds number 100; and (d) double-well potential with non-isotropic stochastic forcing.
  • ...and 7 more figures

Theorems & Definitions (3)

  • Remark 2.1: Basis Functions of the Koopman Operator
  • Remark 2.2: Finite-Horizon MDPs and the Time-Inhomogenous Koopman Operator
  • Remark 2.3: Closed Form Operator Representation of $V^\pi$