System stabilization with policy optimization on unstable latent manifolds

Steffen W. R. Werner; Benjamin Peherstorfer

System stabilization with policy optimization on unstable latent manifolds

Steffen W. R. Werner, Benjamin Peherstorfer

TL;DR

Experiments demonstrate that the proposed approach stabilizes even complex physical systems from few data samples for which other methods that operate either directly in the system state space or on generic latent manifolds fail.

Abstract

Stability is a basic requirement when studying the behavior of dynamical systems. However, stabilizing dynamical systems via reinforcement learning is challenging because only little data can be collected over short time horizons before instabilities are triggered and data become meaningless. This work introduces a reinforcement learning approach that is formulated over latent manifolds of unstable dynamics so that stabilizing policies can be trained from few data samples. The unstable manifolds are minimal in the sense that they contain the lowest dimensional dynamics that are necessary for learning policies that guarantee stabilization. This is in stark contrast to generic latent manifolds that aim to approximate all -- stable and unstable -- system dynamics and thus are higher dimensional and often require higher amounts of data. Experiments demonstrate that the proposed approach stabilizes even complex physical systems from few data samples for which other methods that operate either directly in the system state space or on generic latent manifolds fail.

System stabilization with policy optimization on unstable latent manifolds

TL;DR

Abstract

Paper Structure (33 sections, 1 theorem, 33 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 33 sections, 1 theorem, 33 equations, 10 figures, 1 table, 1 algorithm.

Introduction
Stabilizing dynamical systems
Setup
Problem formulation
Learning on the manifold of unstable dynamics
Stabilization on latent manifolds
Why not every latent manifold is a good choice for stabilization
Principal components can lead to unsuitable latent manifolds for stabilization.
Large amounts of data are required to compensate for poorly suited latent manifolds.
The latent manifold of unstable dynamics
Estimation of encoders and decoders from adjoint data
Leveraging the latent manifold of unstable dynamics for reinforcement learning
Policy optimization on unstable manifolds
Multi-fidelity policy optimization on the unstable manifold
Latent model of unstable dynamics for pre-training.
...and 18 more sections

Key Result

Theorem 1

Let $f$ in eqn:sys be analytic in $(\bar{x}, \bar{u})$ and let the parameter $\psi_{\ast}^{\mathrm{pre}}$ be such that the policy $\widetilde{K}_{\psi_{\ast}^{\mathrm{pre}}}$ is stabilizing for eqn:rom with respect to the zero steady state $(0, 0)$. Then, there exists an epsilon $\epsilon > 0$ such

Figures (10)

Figure 1: Reinforcement learning methods that optimize policies without unstable manifolds (shown above the dashed line) directly query the dynamical environment in its original, high-dimensional state representation. In contrast, the new [MF-]UMPO methods (shown below the dashed line) consider the dynamics over the low-dimensional unstable manifold instead, which reduces the dimension over which the policy optimization has to act and it reduces the complexity of the task at hand by ignoring dynamical behavior that is irrelevant for stabilization.
Figure 2: Unstable manifolds can have much lower dimension that standard latent manifolds: For the flow past a cylinder described by the Navier-Stokes equations, the dimension of the unstable manifold is orders of magnitude lower than of a manifold that generically approximates latent dynamics; see WerP23a.
Figure 3: PCA subspaces are insufficient for stabilization: While the latent model created from a fully converged low-dimensional PCA subspace has the same unstable eigenvalue as the true system, its controlled behavior does not coincide with the true system steered via the corresponding decoded policy, since the controlled latent eigenvalue $\tilde{\lambda}^{\mathrm{c}}$ does not change, while the controlled true eigenvalues $\lambda_{\mathrm{u}}^{\mathrm{c}}$ and $\lambda_{\mathrm{s}}^{\mathrm{c}}$ do (see (a)). Tuning the policy parameter $\psi$ does not stabilize the true system since $\lambda_{\mathrm{u}}^{\mathrm{c}}$ and $\lambda_{\mathrm{s}}^{\mathrm{c}}$ do not decrease below the stability border and the simulation trajectories (see (b)) do not converge to the steady state but oscillate or even grow in magnitude.
Figure 4: Increasing the distance between the PCA and the manifold of unstable system dynamics makes the PCA manifold less suited for the task of stabilization. Via the dynamics coupling in this example \ref{['eqn:example']}, the number of data samples needed to identify any instabilities on the PCA manifold increases by $20\times$ from $\varepsilon = 0.01$ to $\varepsilon = 10$.
Figure 5: Comparison of normalized accumulated rewards: Plots (a)--(d) show that the approach MF-UMPO that uses the high-dimensional system and the latent model together achieves higher rewards than UMPO-MA that uses the latent model alone. For the Toda lattice example with results shown in (e)--(g), MF-UMPO achieves similar rewards as direct DDPG. Note that in the Toda lattice example, UMPO-MA achieves the highest rewards as the latent model of the unstable dynamics hides many of the strongly nonlinear dynamics that affect the stabilization.
...and 5 more figures

Theorems & Definitions (2)

Theorem 1
proof

System stabilization with policy optimization on unstable latent manifolds

TL;DR

Abstract

System stabilization with policy optimization on unstable latent manifolds

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (2)